System size identification from sinusoidal probing in diffusive complex networks

One of the most fundamental characteristic of a complex system is its size (or volume), which, in many modelling, is represented by the number of its individual components. Complex systems under investigation nowadays are typically large and/or time-varying, rendering their identification challenging. We propose here an accurate and efficient method to determine the size of (i.e., number of agents in) a complex, diffusively coupled dynamical system, that leverages the response of the system to an injected probing signal. For our method to be applicable, we rely on some assumptions on system’s characteristics, namely, on the spectrum of the coupling graph and on the basin stability of its steady state(s). Even though such assumptions imply that our method cannot be applied to any instance of diffusively coupled group of dynamical agents, we argue that it covers relevant and interesting examples. Furthermore, the simplicity of the approach and its low computational complexity renders it very interesting for the systems to which it applies.


Introduction
At the era of big data, access to high-resolution (in space and time) measurements is increasingly easy. In parallel, the ever-improving computational power of computers allows for the analysis of such large sets of data [1]. Among many examples, these considerations apply to domains ranging from electrical grid to social networks or gene regulatory networks. With machine learning leading the way, modern technologies promise to improve the wellbeing of humanity [2], e.g., by leveraging the increasing amount of data gathered by phasor measurements units [3] or online social networks [4], to name but a few.
The other side of the coin of this tremendous amount of data is that it comes with significant uncertainties. Indeed, in general, it is not possible to guarantee that each component of a large system will be monitored at all time, either due to the number of units to be monitored or to the time-varying nature of the system's components. These unavoidable inaccuracies and uncertainties can jeopardize the efficacy of data-based technologies.
In this scope, it is of particular interest to be able to recover a system's characteristics (parameters, internal structure, etc.) from measured data. Improving the efficiency of such processes tends to increase the resolution of the inferred characteristics, both in space and time. For instance, recovering the underlying structure of a network of dynamical agents, based on measurements, has been an active topic of research along the last decades [5][6][7][8][9][10][11][12]. However, the majority of those approaches rely on the knowledge of most (if not all) the agents composing the system. Sometimes, a subset of these agents is not accessible to the observer, rendering the recovery of the network harder, if not impossible [13]. In the worst situations, it even happens that the observer does not even know the actual size of the system [14].
The number of components in a system is one of its most fundamental characteristics and is often unknown, especially for large, time-varying systems. A crucial step in the process of system identification is then to recover this number as accurately and with as few measurements as possible. Despite its apparent simplicity, this problem is actually not trivial and surprisingly underinvestigated given its fundamental relevance in the scope of system identification. The authors of [13] proposed an approach to locate hidden nodes in a networked dynamical system, which relies on the comparison between measured and predicted trajectories of the accessible agents of the system. This method requires a good knowledge of the differential equations determining the dynamics of the system in order to numerically integrate them. Moreover, the authors assume that the number of unaccessible agents is very small (usually only one).
As far as we can tell, the most up-to-date approach to recover the number of agents in a system has been detailed in [14]. This approach recovers the number of units as the rank of a detection matrix constructed with time series of the measured units. Namely, it requires to observe the system at k time steps, along M different trajectories, and if k and M are large enough, in principle, the total number of dynamical units can be recovered. In summary, if N is the number of observable units, this method relies on NkM measured values and requires to be able to set the system in M different initial conditions. A recent letter [15] elegantly draws the link between the detection matrix of [14] and the observability matrix commonly used in control theory [16]. Reference [15] shows that the approach proposed in [14] can unambiguously determine the system size if and only if the system is observable in the control-theoretic sense.
There is a wide range of physical or virtual systems that are typically modelled as a set of dynamical agents interacting through a diffusive coupling and according to an interaction network. Examples thereof range from electrical power grids [3,17] to opinion dynamics [4,18], through models of gene regulatory networks [19][20][21], control of vehicular platoons [22,23], or flows in transport networks [24,25]. We show here that, for such diffusively coupled systems, an accurate determination of the number of units can be performed efficiently. Namely, we proceed via measurement of the trajectory of one dynamical unit, following the injection of a probing signal at a single source, which reduces significantly the number of measurements compared to the state-of-the-art literature [14].
The cost of our approach is that we require to be able to inject a probing signal at one of the nodes of the network and that it is restricted to diffusive couplings between agents. The probing signal needs to be tailored so as to satisfy a series of assumptions that guarantee our method to be effective. Namely, the amplitude of the probing should be large enough to be detected, but small enough not to destabilize the system. Also, the probing's frequency should be small enough, so it impacts the whole network. This series of assumption might seem restrictive. Nevertheless, we argue that our method still covers a significant range of relevant dynamical systems, power grids being just one example of those. Furthermore, in our opinion, the limitations above are counterbalanced by a significant reduction in computational cost and extended scalability, as only one time series has to be analyzed, and the method is independent of the system's size.

Systems of diffusively coupled units
Let us consider a system of n dynamical units, interacting through a (possibly nonlinear) diffusive coupling, where x i ∈ M is the time-varying value of the ith agent, evolving on a one-dimensional manifold M, ω i ∈ R is the natural, constant driving term of agent i, and b i will be used as an input to the system. Two agents i and j are interacting if a link between them exists in the interaction network, i.e., if and only if the corresponding term of the adjacency matrix a ij = 1. We assume that the interaction graph is connected if undirected and strongly connected if directed [otherwise we restrict ourselves to a (strongly) connected component]. The interaction function between i and j is a differentiable function f ij : R → R satisfying f ij (0) = 0 and f ij (0) > 0 (where the prime denotes the first derivative). If a fixed point x * ∈ M n exists, one can linearize equation (1) around it, which yields, for a small deviation δ = x − x * , to the approximate dynamicsδ where we define the Jacobian matrix of equation (1), Equation (2) is a good approximation of equation (1) as long as the system remains in the vicinity of x * . One can verify that the structure of the interactions implies that the Jacobian J is a weighted directed Laplacian matrix of the interaction graph. Let us denotes its right-(resp. left-) eigenvectors u 1 , . . . , u n (resp. v 1 , . . . , v n ). From now on, we will focus on stable fixed points of equation (1), implying that the eigenvalues have nonnegative real part, 0 = λ 1 < Re(λ 2 ) . . . Re(λ n ) (note that one eigenvalue is always zero) and the eigenvectors form a basis of R n . Equation (2) is then solved by expanding the deviation δ over the right-eigenvectors u α of J, i.e., δ where we used the bi-orthogonality relation v α u β = δ αβ . The latters are solved by multiplying by e λ α t and then integrating by parts. It follows that for α = 1, . . . , n.

Sinusoidal probing and system size estimation
Our method relies on the measurement of the response of the system after the injection of a probing signal at some point. Such approach are used, for instance, to identify some eigenmodes in electrical grids [26], or in gene regulatory networks to infer the existence of causal relations between genes [27]. Before probing the system, we require it to be at (or close to) steady state, and if necessary, we let the transient due to initial conditions decay. If the system does not converge to a steady state, our method cannot be applied, but in many cases, determining the number of agent is a very secondary problem if the system is not at steady state, e.g., power grids.

System size estimation
In theory, any signal could be decomposed as sum of sinusoidal signals, by Fourier Transform. We then propose to inject the most fundamental building block of the Fourier decomposition, i.e., a sinusoidal signal at agent i and to measure its impact at agent j (possibly equal to i). Let be the probing signal at agent i, and b j ≡ 0 for j = i. The choice of amplitude b 0 and frequency ω 0 are mostly constrained by the nature of the system under investigation. The probing amplitude should be small enough in order not to jeopardize the normal operation of the system, but, at the same time larger than the ambient noise to which the system is subjected. The frequency of the forcing should be sufficiently small, such that the disturbance propagates to the whole system. The probing is designed to leave the system in the vicinity of the fixed point x * , so that the response of the system is dominated by the linear terms and the method can be applied to general diffusive couplings. We defer a thorough discussion about the tailoring of the probing signal to section 3.2.
Introducing equation (6) into equation (5), and recombining the eigenmodes yields the following response measured at agent j while probing at agent i, Provided the probing frequency is small compared to the Jacobian's eigenvalues, ω 0 Re(λ α ) [in practice, ω 0 < Re(λ α )], after a few periods of the probing, the exponential in equation (7) vanishes [Re(λ α )t 1], and the sine dominates all non-zero modes. One then approximates equation (7) as where the † denotes the Moore-Penrose pseudo-inverse. Defining we re-write equation (10) as We notice that the value of the response δ i j averaged over an integer number of periods isδ = b 0 /nω 0 . Keeping track of one trajectory, we can then estimate the number of agents composing the system aŝ and the longer the measurements, the better the estimate.

Remark.
For ω 0 sufficiently small, C ij is also well approximated by b 0 /nω 0 , and thus the amplitude of the trajectory of δ i j is given by 2δ, which is another way to estimate n, The only requirement is that the time series is long enough so a full period of the probing signal can be observed. We emphasize that, in general, in an application of our method, one would have access to the absolute state of the measured agents, and not directly to the deviation δ i j (t). However, the probing signal being known, we know that the deviation term vanishes at times t = 2πk/ω 0 , k ∈ Z, which allows to recover x * j and to isolate δ i j (t).

Probing amplitude and frequency
The probing signal needs to be design in order to satisfy the assumptions of the previous section. With a sinusoidal probing, we can mostly act on amplitude and frequency. Even though there is no general recipe to determine these characteristics, we detail here determinant factors in their choice, which we summarize in the left panel of figure 1. The amplitude of the signal is constraint both from above and below. On one side, the signal should not destabilize the fixed point of the system. We then required it to be sufficiently small, i.e., b 0 < B + , so the system is not pushed out of the basin of attraction of its steady state [28,29]. Determining such upper bound on the probing amplitude requires a knowledge of the system at hand and cannot be done in full generality. We refer to [30][31][32] for a discussion of the notion of basin stability, that is relevant here.
On the other side, if the system is subject to noise, then the amplitude of the probing needs to be sufficiently large so it can be distinguished from the noise, i.e., b 0 > B − . One way to determine the required amplitude is to analyze the noise to which the system is subjected, by taking measurements (which is possible at least at one agent by assumption) before injecting the probing signal.
Note that under our assumptions, it would be unrealistic to have B − close to or larger than B + , otherwise the noise would drive the system close to the boundary of the basin of attraction and eventually destabilize it. A system that spontaneously jumps from one basin of attraction to another cannot be considered as in steady state and is then outside the scope of this manuscript. There is then room to choose an appropriate probing amplitude.
For the probing frequency, even though there is no theoretical lower limit, any implementation of our method will have some technical limitations. First, these limitations depend on the device or process that is implemented to inject the probing signal into the system, which might not be able to generate a signal with arbitrarily low frequency. Second, the steady state of a system might change over long time scales (compared to the system's response to disturbances). For instance, power grids can be considered at steady state at the time scale of a few minutes, but their operating state might significantly change over an hour. In such a case, if the steady state of the system changes within one period of the probing, our method could lead to spurious estimates. In general, there is then a lower bound Ω − > 0 to the probing frequency.
The upper limit to the probing frequency Ω + depends, as mentioned above, on the spectrum of the Jacobian matrix J. In the linear regime about a steady state, the transient following a disturbance can be decomposed on the eigenmodes of the Jacobian. Each component of the transient will then decay exponentially as e λ t, where λ is the real part of the corresponding eigenvalue of J, which is negative if the fixed point is stable. Then, introducing a disturbance continuously and at a rate smaller than the eigenvalues of J prevents the occurence of transient, as those are decaying faster than the disturbance is introduced. The impact of a slow probing signal is then to continuously displace the fixed point, without generating intractable transients. Determining the spectrum of the Jacobian is not a trivial task in general, in particular in our framework, where we assume that we have not many information on the system. However, field expert are often aware of the typical time scale of their system of interest, e.g., electrical engineers know the typical time scales in the power grid, and biochemists are aware of the time needed for their reaction to reach a steady state. The appropriate probing frequency should then be determined by more specialized experts.
In contrast with the bounds on the amplitude, we cannot guarantee that Ω − < Ω + . For instance, if a system is composed of two components with strong intra-connections, but weak inter-connections (e.g., a single weak link), then the smallest nonvanishing eigenvalue λ 2 , also called algebraic connectivity or Fiedler eigenvalue [33], can be extremely close to zero, as illustrated in the right panel of figure 1. In such situation, it is possible that Ω − > Ω + , which renders our method inapplicable to the whole network. This observation shows that our method is restricted to networked systems that are sufficiently connected (in the sense of the algebraic connectivity), or to sufficiently connected subnetworks if weak links are present.
Remark. If one considers system with higher-order dynamics in equation (1), the method still holds. The only difference lies in the conditions for the probing frequency to be considered as 'small', in which case, one can verify that higher-order time-derivatives do not influence the response of the system to the signal. A more detailed discussion about the response of second-order dynamical systems to slow disturbances can be found in [34].

Numerical validation
In order to get numerical confirmation of our results, we will apply them to the Kuramoto model [f ij (x) = sin(x)] on three different interaction graphs. Each system has the same number of units, n = 3809, but significantly different network structure: ER. An Erdös-Rényi graph with edge probability p = 0.003 (m = 21 444 in our realization); SW. A small-world constructed following the Watts-Strogatz process [37], with m = 38 090 edges and rewiring probability p = 0.01; EU. The more realistic network of the PanTaGruEl model of the European high voltage grid [35,36], composed of m = 4955 edges.
For probing, we use a sinusoidal signal with controlled amplitude and frequency, similarly as what is typically used to identify eigenmodes in electrical networks [26]. Figure 2 shows the trajectories δ j i obtained by both probing and measuring single nodes, i.e., i = j = i 0 . For each network (each panel of figure 2), we repeat the simulation changing probed and measured node i 0 over 7 different location chosen randomly. The estimations of the system size from these trajectories are given in table 1. One sees that our method accurately  [35,36]. All networks have n = 3809 vertices. We consider a probing frequency satisfying λ 2 /ω 0 = 20/2π, for each network. The estimate of the number of agents in the system is given by the mean value around which trajectories oscillate. To get that mean value, we take the average of the maximum and minimum for each trajectories, as depicted in the third row panel.  Table 2. Estimated number of agents and relative error obtained from 7 different simulations where a single was both probed and measured within noisy conditions. The noise amplitude is such that η 0 /b 0 = 0.01. The simulation parameters are the same as for figure 2, except for the ER network where we double the probing time to obtain more accurate results. estimates the number of agent in the system as for SW and ER networks, errors of less than 8 over 3809 nodes (less than 0.21%) are observed and for EU network 11 nodes (less than 0.31%).
We further test the robustness of our method in case of noisy conditions that readṡ where η is some Gaussian white-noise acting at each node independently. More precisely, its moments are given by, where, here δ ij is the Kronecker symbol. Table 2 shows the estimated number of agents for the dynamics of equation (16), with a noise amplitude at each node satisfying η 0 /b 0 = 0.01. As expected, the noise reduces the precision of the estimation. The biggest error is of 165 over 3809 (4.3%), for the ER network. A better estimation of the number of agents in noisy conditions might be obtain by averaging over many probed nodes, or over longer time series.

Remark.
To obtain the estimations in table 2, we averaged δ j i over an integer number of period of the probing signal, in order to average out the effect of the noise. Indeed, if the probing frequency is extremely low, one might pretty long time series to get an accurate estimate. Such tradeoff has been discussed in section 3.2.

Discussion and conclusion
For diffusively coupled systems, we improved the current state-of-the-art methods to estimate the size of a complex dynamical system, based on one time series measurement solely. The computation cost is low and independent of the system's size. On one hand, our approach relies on a series of assumptions that prevent it to be applicable in full generality, to any group of diffusively coupled agent. On the other hand, the simplicity of the method, its low computational cost, and its scalability render it extremely valuable in contexts where it applies. Future work will aim, in particular, at extending the domain of applicability of this method.
In our opinion, the performance of our approach (in terms of cost of data acquisition and computation) is close to be as optimal as possible. Indeed, in order to get information about the system, one needs to measure something, i.e., at least one output, which is what we do. Furthermore, to be able to analyze the output of the system, the observer needs some information about the input that triggered the response. Identifying characteristics of the system from measurements solely can hardly rely on fewer data. The method would be improved by relying on shorter time series, which can be long in our case, due to the low frequency of the probing signal, but in such a case, one would need to rely on a completely different approach.

Data availability statement
No new data were created or analyzed in this study.