Reconstructing dynamics of complex systems from noisy time series with hidden variables

Reconstructing the equation of motion and thus the network topology of a system from time series is a very important problem. Although many powerful methods have been developed, it remains a great challenge to deal with systems in high dimensions with partial knowledge of the states. In this paper, we propose a new framework based on a well-designed cost functional, the minimization of which transforms the determination of both the unknown parameters and the unknown state evolution into parameter learning. This method can be conveniently used to reconstruct structures and dynamics of complex networks, even in the presence of noisy disturbances or for intricate parameter dependence. As a demonstration, we successfully apply it to the reconstruction of different dynamics on complex networks such as coupled Lorenz oscillators, neuronal networks, phase oscillators and gene regulation, from only a partial measurement of the node behavior. The simplicity and efficiency of the new framework makes it a powerful alternative to recover system dynamics even in high dimensions, which expects diverse applications in real-world reconstruction.


Introduction
Complex interacting systems are ubiquitous and essential for the modern world which include both the social ones: financial systems, power systems, transportation systems, the internet, and the natural ones: neural networks, gene regulation systems and the earth system [1][2][3][4][5][6][7][8][9][10][11][12] . Diverse and intriguing behaviour is seen arising, out of the complexity of dynamics of individual agents constituting the network, or more often, of the interactions between them. Although more and more data are now measured in these systems, it remains a great challenge to reconstruct network structures and dynamics because a direct measurement of them is rarely possible [13][14][15][16][17][18] , which nevertheless are critical for an effective description, analysis, prediction and control of system behaviour.
System reconstruction is an inverse problem that has been studied extensively in different contexts at various levels [19][20][21][22][23] . With the time-delay embedding technique 24 , an equivalent phase space may be constructed and many dynamical features could be computed but interaction rules between different coordinates need further exploration. Another set of convenient tools are based on information theoretics [25][26][27][28] , which quickly deduce the network topology from time series but the exact strengths of connections are hard to retrieve. Moreover, with these techniques, a computation of direct interaction links could be very cumbersome in complex networks 12 . To obtain equations of motion, model-based methods are used to determine unknown parameters in the pre-assumed dynamical models, such as differential equations or discrete maps. However, many of these methods only apply to low-dimensional systems 29,30 . To conveniently reconstruct networked systems, methods based on cybernetics were designed [31][32][33][34][35][36][37] , so that the auxiliary response network synchronizes with the studied network (driving network) via the measured time series, if the network parameters are correctly estimated. However, when applying these methods, we should know the node dynamical equations and measured the states of all nodes with no interference, which may not be possible in practice.
Almost invariably, noise is an unavoidable factor in almost all practical reconstruction, which constitutes a major difficulty on various occasions. Distinct methods have been developed to remove or utilize noise in a time series to recover the interaction patterns based on noise-induced dynamics or correlations [38][39][40][41][42] , in which either a small noise limit is assumed or priori smoothing of the data should be performed. To utilize all the data to suppress fluctuations, Wang 43 proposed a globally invariant local fitting method, which is capable of processing data with moderate or strong noise. For high-dimensional systems, if the structure of the complex network is known, Gao 44 developed a two-stage approach (first stage, global regression; second stage, local fine-tuning) for autonomous inference of node and interaction dynamics of complex systems, which, however, needs knowledge of the network topology and the state of all nodes.
Lack of measurement of all variables is surely another major problem for dynamics reconstruction, which requires the recovery of not only unknown system parameters but also the time course of unmeasured variables -the so-called hidden variables. In 1990, Breeden et al. 45 discussed how to reconstruct system equations in the presence of hidden variables. Their method is based on the flow algorithm developed by Cremers et al. 46 , which is similar to that of Crutchfield and McNamara 47 . It can be applied to systems with one or more such hidden variables, provided that the form of the reconstruction function is known. In a different trial, Gouesbet 48,49 introduced the concept of standard system to numerically evaluate a set of unknown constants related to the target vector field but high-order derivatives are needed which may be hard to compute in the presence of noise. Wang et al. [50][51][52][53][54] developed a completely data-driven method by using compressed sensing, which effectively infers the existence and location of hidden nodes in a network based on anomalous prediction errors. However, the determination of the coupling strength requires further investigation. With an information theoretic approach, Wu 55 used the piecewise approximation to carry out a partial Granger causality test (Guo 56 ) to detect the hidden variables with minor impact. The method of high-order cross correlation (HOCC) 57-59 is proposed as a general technique to reconstruct networked systems driven by noise and with hidden variables. However, the requirement of high-order derivatives could easily amplify noise, which fails the reconstruction if the noise is not small. Similar methods were developed and explained by Ching and Tam 40, 60, 61 based on observable covariances. In all, the presence of hidden variables brings considerable challenges to the network dynamics reconstruction, especially when the system dimension is high and the dynamics is contaminated with noise. Here we propose a new framework to do the reconstruction based on a cost functional, which infers both the unknown parameters and unmeasured time series of hidden variables simultaneously. A gradient descent algorithm is employed and leads to a set of ordinary differential equations (ODEs). The measured time series are used as inputs which drive these ODEs such that the cost functional approaches to zero. Thus it is possible to recover the parameters and trajectories in real time with online input. Even in the presence of noise and in high dimensions, the proposed reconstruction scheme is still valid. As expected, the more the measured data, the less the derived equations. Because of the evolution character of the method, different from most of the current existing methods, the unknown parameters could enter the equations in a nonlinear way. Successful application of the method to typical examples of network models demonstrated its efficiency and great potential in dynamics determination in complex systems.
The rest of the paper is organized as follows. The main ideas of our approach are presented in detail in Section II. In Section III, we first apply the method to the well-known Lorenz system driven by white noise, and check the impact of noise intensity and parameter selection. Then, its validity is further demonstrated in application to networks with different local dynamics: the Lorenz oscillators or the FitzHugh-Nagumo(FHN) neural network models. Finally, a discussion of hidden nodes and applicability of the method is made with an example model of coupled Kuramoto oscillators. In the final section, we conclude this paper and point out possible future directions.

Theory
We consider a system consisting of N nodes, where the dynamics of each node is determined by a set of differential equations with interactions from other nodes that form a network, given by: where i, j ∈ 1, 2, 3 · · · , N and X i = X ∈ R D is the state vector of node i. The vector F i dictates the motion of the i-th node with f i , g i j being smooth functions. The parameters p include the local ones B i and the coupling strength A i j . The functional form f i and g i j are assumed known but the parameters p have to be determined from the given time series.
In the following, we present a method for reconstructing the network connection and equation parameters with only a partial observation of the system state. Those measurable variables constitute an n-dimensional subspace, denoted by x and the complementary space is thus N × D − n-dimensional marked with variables y. Of course, we may write X t = (x t , y t ) if x, y are properly arranged. Their dynamics is governed by Eq. (1), which with a slight abuse of notations could be written in a more convenient forṁ where the subscripts o = (1, 2, · · · , n) t and h = (n + 1, n + 2, · · · , N * D − n) t denote different dimensions of the phase space. To reconstruct the system, an error function is proposed as an integrated squared difference between the left-and the righthandside of Eq. (2) in a time interval [0 ,t], where β is an adjustable hyper-parameter. With correct values of the parameters and orbit points, obviously ∆ takes its minimum value 0, indicating the existence of an absolute minimum. While the velocityẋ could be computed from the known time series of x(t) 43 , the state variables y as well as their time variationsẏ have to be deduced. The strategy is to systematically modify the parameters p and the initial conditions of y so as to minimize the cost function Eq. (3), treating x andẋ as known.
The integration time t should be long enough to exhibit characteristics of the orbit and to suppress small noise in the data. Nevertheless, if t is really large, due to the exponential amplification of deviations in a chaotic system, the error function ∆ will become extremely sensitive to the parameters as well as to the initial conditions. To balance the computation, an exponentially decaying factor e −α(t−s) is supplied to the integrand. With this modification, the error function essentially depends only on a time interval ending at t and with a length characterized by the hyper-parameter α. With this decaying factor, the memory of initial conditions becomes vague with increasing t and we have to employ an alternative way to push the state point to the correct trajectory. For this, we modify the equation of motion of y in the following manneṙ where e are artificial parameters to gradually correct errors due to the wrong initial conditions, which has the same dimension as y. Later on, as will be seen, the parameters e are also evolving slowly, sending y to the right track. Toward the end of the course, e gradually shrinks to 0, recovering the original equations of motion Eq. (2). With all the above consideration, the error functions could be modified to Thus, the problem converts to parameter learning which minimizes the function Eq. (5). Nevertheless, it is not as simple as it seems. The state variable y is in fact an implicit function of p and e since it is computed from the evolution equation (4). Thus, we have From Eq. (4), we may get the evolution of the parameter dependence In the above two equations, the last terms that contain α are supplied to damp the parameter dependence of earlier times. With the increase of integration time, this dependence could become extremely sensitive in a chaotic motion if α = 0. Therefore, those two terms will keep a finite memory as done in Eq. (5). The evolution of parameters can be chosen along the gradient direction, thus dp dt where γ is the learning rate and the variables H , E are defined in Eq. (4) and satisfy the equations where H 1 , E 1 is the integrand apart from the decay factor in the expression H , E displayed in Eq. (6).
With the above set of evolution equations for y , ∂ y ∂ p , ∂ y ∂ e , p , e , H , E and the given time series of x, we may drive the unknown variables to recover both the parameters and the orbit. The initial conditions of these equations are given somewhat arbitrarily although prior information could be used to speed up the convergence. Assuming the number of unknown parameters p is n p , then the total number of parameters is n c = n p + n e = n p + N * D − n where n e = N * D − n is the number of the auxiliary parameters e. Therefore, the total number of equations to solve is n t = (N * D − n)(1 + n c ) + 2n c , since we care about each hidden variable as well as their derivatives with respect to all the parameters and the evolution of each parameter needs two equations(check Eq. (8)and Eq. (9)). It is easy to see that the number of equations is quadratic in the number of hidden variables y but linear in unknown parameters p. Hence, the more information we have, the less effort is needed to solve these equations, being consistent with the common intuition. Thus, the complexity increases quadratically with the number of hidden variables, which makes this type of problems very hard.

Numerical simulations
In the following, several well-known examples are used to illustrate application of the above formulation for recovery of system dynamics and parameters. The absolute error of an estimated parameter is calculated as |parm − parm |, where parm is the inferred result and parm is the true value. First, as usual we try the Lorenz system and check how the performance of the new algorithm depends on various hyper-parameters.

Lorenz system
The Lorenz system is a famous deterministic dissipative system 62 , which is written aṡ With the often-chosen parameters a = 10, b = 2.6, r = 28, this system displays very typical chaotic behaviour with the maximum Lyapunov exponent close to 1. If the time series of x, y, z are all known, it is not hard to recover the coefficients of the polynomials on the right hand side of Eq. (10) 20,22,35,42,43 . Instead, if only one variable, say x, is observed, the recovery becomes much harder 59 . Here, the new formulation is applied with two hidden variables y, z, three unknown parameters a, r, b and two auxilliary parameters e 2 , e 3 . Thus, according to the above discussion, the total number of equations is [1,1] are conveniently used as the initial conditions for the corresponding variables. With the above algorithm, the three unknown parameters a, r, b converge well as shown in Fig. 2(A,B,C) and Fig. 3(C,D). For different hyper-parameters, the convergence seems to follow similar patterns. Initially, the unknown parameters change steadily but at some point start to rush to the true values and then remain there throughout the simulation. Nevertheless, the convergence is different in disparate conditions and could even go awry if improper hyper-parameters are chosen as displayed in Fig. 2(D) and Fig. 3(A,B).
In this example, the integration time step is t unit = 0.01. As can be seen from the figures, a total number of N step ∼ 10 5 steps are needed for the convergence, which is determined by the two hyper-parameters α , γ. Currently, we take α = 3, indicating a memory of 1/α = 0.33 which is about a third of the Lyapunov time of the system. The learning rate is chosen to be γ = 0.01, about one hundredth of the Lyapunov exponent. γ is used to control the update of the parameters in the evolution equation and should keep small compared with the time rate of the system state (y, z) in order not to disrupt the generic dynamics of the original system. As shown in Fig. 2(A,B,C), a reduction of γ extends the learning time and requires more data for convergence. Nevertheless, if the available time series has a limited length, a technique of data recycling may be employed for a successful reconstruction as shown in Appendix 0.2. On the other hand, if γ is overly large, the parameters change rapidly and the original dynamics may be altered so that no convergence results, as displayed in Fig. 2(D). So during the reconstruction, the learning rate should be chosen properly to ensure both the stability and efficiency.
The decay rate α should not be too small in order to bound the integrals in Eq. (6). On the other hand, if α is too large, the memory becomes really short and these integrals highly depend on the current state which varies rapidly and easily overruns the slow parameter changes. In Fig. 3, the influence of the decay rate α on the convergence is displayed. As α increases, the learning becomes slow and more data is needed, just like reducing γ. When α is relatively small (Fig. 3(A)), the effective integration time becomes long and the integral will be very sensitive to parameter changes, causing instability in the system, unless the learning rate γ is really small and averaging out the fast oscillations. Therefore, for a system with a large Lyapunov exponent, when the learning rate is fixed, it is necessary to apply a proper decay rate to reduce the sensitivity of the parameters ( Supplementary Fig. 21, Supplementary Fig. 22). In general, it is important to adjust the learning and the decay rate to keep a balance between the two so as to achieve a stable and fast convergence. From the above discussions, it seems that the current reconstruction scheme is quite fragile. Actually, it is not and able to work quite well even in the presence of noise. Next, we investigate the impact of noise on the reconstruction since it inevitably exists in every experiment or observation. For this purpose, we add a noise term to each of the evolution equation (10), which describes white noise with the following characteristics.
where · denotes an ensemble average and the delta function is a Kronecker one for the discrete index i, j, and a Dirac one for the continuous time variables t − t . D is the noise intensity and when D = 0 we recover the deterministic autonomous system. Even though a noisy orbit is generated with Eq. (10) with noise (D = 0), we still use the same deterministic formulation in the previous section. As an example, suppose that only the noisy x−orbit is given and we still aim to deduce the unknown parameters and the hidden variables y(t), z(t). If the noise is not very strong, x(t) may be directly substituted into the 22 reconstruction evolution equations mentioned above, together with theẋ(t) computed with any fair smoothening technique 43 . As a result, only the average orbit is reconstructed with the unknown parameters. Fig. 4 shows the reconstruction results at different noise intensities, i.e., D = 0.1 in Fig. 4(A,B) and D = 1 in (C,D). The convergence pattern of the parameters displayed in Fig. 4(A,C) looks very similar to that in the deterministic case but small jittering exists due to noise perturbation, which is more clearly seen in (C) for D = 1. After the small oscillations are averaged out, we get a fairly good estimate of the parameters, 5/23 which is listed in Table 1. In fact, the results of extra tries are also displayed in the Table up to D = 5. In general, the estimation gets worse for greater noise intensity as expected but the errors do not appear monotone. The reconstructed orbits y(t), z(t) are displayed in Fig. 4(B,D), which seem almost overlapping with the true orbits, just as in the deterministic case.
From the above analysis, it can be seen that the new construction scheme is quite robust against noise as long as it is not too strong. Next, the scheme will be applied to several typical examples in nonlinear dynamics with high dimensions to show its validity.  First, we consider the inference problem in a network with N nodes each of which is occupied by a Lorenz system. The connection between nodes marks their interaction. More specifically, the i−th node is governed by the equation beloẇ where η i (t) is the above-mentioned Gaussian white noise and A i j is the interaction matrix. Here, without less of generality, the interaction is assumed to take place between y−components. When N is small, we may start with a full matrix [A i, j ] containing N 2 elements to be determined. The network structure can be drawn from the non-zero elements of [A i, j ] after the reconstruction and we do not need any prior information about the network topology. Nevertheless, this type of analysis is not scalable since the quadratic increase of the number of the unknown matrix elements with the network size prevents their efficient estimation based only on the linearly increasing information. In fact, the current brute-force method works up to N ∼ 20. For larger networks, extra information about the network topology helps a lot, especially those that cut down the number of unknown matrix elements. For example, if the structure of the network is approximately known (which may still contain some spurious links), our computation remains valid as long as the number of links increases not that fast.
Here, we tried the reconstruction on a small-world network with N = 100 and an average degree K = 4 as portrayed in Fig. 6(A). For an easy comparison, the non-zero coupling is allowed taking only four values 0.1, 0.3, 0.5, 0.8. The results are displayed in Fig. 6(B,C) which agree well with the original values. The convergence of the local node parameters looks quite similar as in precious cases (see Fig. 6(B)). However, the evolution of the couplings starts more erratically. The fluctuations grow larger and larger and culminate just before a sudden convergence to the true values (see Fig. 6(C)). The whole process closely resembles what happens in chaos control: only when the system state gets close to the stable manifold of the true dynamics and the true parameters does the gradient descent scheme in Eq. (8) start to work and drive the wandering point to the right solution 63 .
In the current example, we have 700 unknown system parameters, including 300 node parameters and 400 coupling strengths. In addition, 200 auxiliary parameters e are needed to drive y i (t) , z i (t) to the right trajectories. Altogether, we have 900 hundred parameters, which will generate over 180, 000 equations in the new formulation, if we exactly follow the procedures in the theory part. However, a convenient approximation could be adopted based on the given network structure and the observation that the influence of a node on other nodes decays quickly in an asynchronous state. According to the equation of motion, each of the above-mentioned parameters could be assigned to a node or a connection between nodes. To the lowest order, it is reasonable to assume that the dynamics at a node mainly depends on the parameters of neighbouring nodes or edges. With this assumption, the partial derivatives with parameters are taken to be zero if they locate outside the immediate neighbourhood. As a result, the number of equations is greatly cut down, which in fact grows only linearly with the size of the network with a fixed average degree.
As the true values of the parameters or state variables are obtained through a co-evolution with the observed time series, the proposed method can be applied online to monitor changes in network topology or node dynamics if the change does not happen so frequently. As an example, the above formulation is directly applied to the network with 8 nodes displayed in Fig. 17(A) in the Supplementary. Starting from a somewhat arbitrary initial guess value 1, the coupling converges to the true value 0.1 after a quite long transient. Later on, as the coupling is switched to 0.5 in the network, the reconstruction algorithm detects and quickly follows this change. It is interesting to see that the reaction is really fast this time and the transient is almost unnoticeable as shown in Fig. 7(A). The node parameters that change at the same time can be similarly observed, as shown in Fig. 7(B). One reason for the quick response for the parameter change is that the state variables {y i (t), z i (t)} i=1,2,··· ,N remain close to the true values in this process so that the convergence seems monotone and exponential! 7/23

The FitzHugh-Nagumo networks
A salient feature of the FitzHugh-Nagumo (FHN) system is the excitability -once the potential exceeds a threshold, the system state immediately shoots to a very high value and then relaxes back to the base. Therefore, the temporal pattern is quite different from that of a Lorentz system. Is it still possible to apply our method to this new situation? Below, we check the continued validity of the current scheme.
The FHN equation is a second-order nonlinear system used to simulate the dynamics of a neuron. To model neural networks, these neurons connect to each other and interact with their neighbours. The equations of motion can be written aṡ where i = 1, 2, · · · , N denotes different neurons, v i is the rate of change of the membrane potential, w i is the slowly changing recovery variable of the i-th neuron and N is the number of neurons. In this model, the dynamical behaviour In practice, only the potential v i (t) of each neuron is subject to a direct measurement. Is it possible to recover w i (t) and all the parameters in the equation with this information? Fig. 8(A) displays a Erdős-Rényi (ER) network with N = 25 from which the measured data was obtained. As usual, we apply the above scheme by initially setting all unknown coupling strength A i j to 1 and the local system parameters to 0. With this network size, all the parameters could be obtained directly without other assumptions. The obtained coupling strengths are depicted in Fig. 8(B) with the absolute error of the order of 10 −2 , while the local parameters are portrayed in Fig. 8(C) with the absolute error of the order of 10 −3 . Fig. 9(A) shows the convergence of local parameters for a node from the initial values 0 to the correct values, where the oscillatory character at the beginning is still clearly seen. Fig. 9(B) displays the time series of v i (t) where the profile is contaminated partly by visible noise. However, the reconstructed time series of w i seems smooth and overlaps with the true one almost perfectly.
For small-world FHN networks with known structures, for example, the one in Fig. 6(A), how does the reconstruction accuracy depend on the network size? If the average degree is fixed to 4, this dependence is shown in Fig. 10(A). It is easy to see that the inference accuracy can be maintained above 95% after averaging over 20 realizations. The accuracy is defined as 20 where ρ is the required accuracy and we take ρ = 0.1 here. H is the Heaviside function and the relative error between the reconstructed (Â i j ) and the real coupling (A i j ) is: The main source of error is the synchronization among part of the nodes, which makes it hard to distinguish them and thus leads to incorrect inferences. More discussion will be made below. On the other hand, if we keep the network size and 9/23  increase the average degree, the inference will become harder since more couplings need deducing. Nevertheless, according to Fig. 10(B), the accuracy still maintains near the value 95% throughout.

The Kuramoto oscillators
The ability to deduce a large number of parameters from the time series may be applied to the reconstruction of networks with hidden nodes, about which little information is known. Even their existence has to be deduced from the measurement on neighbouring nodes as done in previous works to determine the possible connections of hidden nodes to the known ones in the network [51][52][53][54] . Nevertheless, this type of deduction is able to confirm the existence of hidden nodes but the precise connections may not be inferred very accurately. In our scheme, all possible links may be included in the deduction as long as the node's existence is expected. The spurious ones will give a weight near zero and the weights will be recovered for the true ones. Also, the dynamics and internal parameters of the hidden nodes could be deduced as well.
Here, Kuramoto oscillator model 64 defined on networks will be used as an example for this computation. Each oscillator in the model is described with a phase, so D = 1. The coupling between them induces synchronization at various levels, which has been studied comprehensively 65,66 . Consider a system with N oscillatorṡ where θ i and ω i are the phase and the natural frequency of the ith oscillator, K > 0 is the coupling strength and a i j is the adjacency matrix of the network. When node i and node j are connected, a i j = 1; otherwise a i j = 0. To characterize synchronization, in the literature 66 , an order parameter R is defined as follows where R, ψ ∈ R. If K 1, each oscillator rotates near its natural frequency and R ∼ 0. With increasing K, the average frequencies of connected oscillators approach the average natural frequency in general and some of them may start to rotate together, reaching a partial synchronization such that R > 0. For sufficiently large K, all the oscillators become synchronous and rotate with a common frequency ∑ i ω i /N, so that R ∼ 1. During the course of dynamics or parameter deduction, synchronization is harmful as seen previously and will be explained below.
In the current example, 32 oscillators are connected to form an Erdős-Rényi (ER) network as shown in Fig. 11(A) and ω i 's are picked up from a uniform distribution in the interval [0, 20]. The coupling strength K = 1 is not very large to prevent synchronization. We assume that four hidden nodes exist in the network. Their dynamics and natural frequencies are not revealed to us, while the time series of the remaining nodes are known. First, the very existence could be deduced together with their rough connections with existing methods in the literature 51 . Next, the whole network structure and the natural frequencies of these hidden nodes are reconstructed with the formalism introduced in this manuscript. In the following, the hidden nodes are randomly selected from the network with no neighbouring pair.
In this example, as the number of nodes is not large, as long as the very existence of hidden four nodes are known, we may start with a full connection matrix in the reconstruction. That is, the initial values of all entries of [a i j ] are assumed to be 0.5,and the natural frequency ω i 's start with the value 1. In Fig. 11(B), a comparison is made between the simulation results and true values of the natural frequencies. The error seems small, not exceeding a few percent. The convergence of the natural frequencies of the hidden nodes is displayed in Fig. 11(C). The familiar oscillatory profiles appear again but they decay much faster than in previous examples. In Fig. 12, it is easy to see that the accuracy of the reconstruction (see Eq. (14)) decreases with the increase of hidden nodes. The larger the network, the faster the decay. Nevertheless, from the figure we see that when the hidden nodes are less than 15%, no matter how large the network size is, the accuracy is higher than 90%. In fact, our method is even able to infer the structure of a cluster of hidden nodes. In Fig. (13), there are 20 hidden nodes (green dots) forming a small cluster. The structure as well as all the parameters is unknown and none of them is observed directly.We can only measure their neighbouring nodes (black ones). From the measured time series, the natural frequencies of nodes in the hidden cluster and the coupling strengths among nodes can be inferred quite accurately, which is clearly displayed in Fig. 14(A). A full connection between hidden nodes is assumed at the start and the algorithm will computed the actual strengths of the links: about 0 for the non-existing ones and 0.5 for the true connections. The convergence seems fast (see Fig. 14(B)).

Synchronization undermines deduction
If the network is fully synchronized, all nodes behave in essentially the same way. The time series look similar apart from a phase or a scaling factor, so it is hard to carry out the inference (see Supplementary Fig. 23). Generally speaking, the more synchronized the network, the harder it is to infer the structure or coupling. Taking the Kuramoto oscillator network as an example, the natural frequency ω of each node follows the uniform distribution W (0, µ), and a larger µ indicates that the dynamics has higher heterogeneity. As shown in the Fig. 15, for µ ∈ [1, 10], with the increase of µ, the inference accuracy increases significantly. When µ reaches 10, the accuracy is over 95%, and fluctuates around this value upon further increment of µ.

14/23
In order to address in more detail the effect of synchronization on inference, we study a network of six oscillators coupled in a specific configuration as shown in Fig. 16(A), two of which are assumed to be hidden nodes. The natural frequencies of the oscillators conform to a uniform distribution in [0, 1] and the coupling strength is set to 3. In Fig. 16(B), the coupling strengths indeed converge, but to completely wrong values, since in this case the oscillators fully synchronize. Nevertheless, if the distribution of the natural frequencies is expanded to a larger interval [0, 30] so that synchronization is not reached, our algorithm is able to dig out the coupling strengths as shown in Fig. 16(C). To quantify the performance of the algorithm at different levels of synchronization, we employ two statistical indicators -the order parameter 0 ≤ R ≤ 1 defined in Eq. (17) and the G-P dimension 67 for the oscillator network. The order parameter reflects the degree of synchronization, which is between 0 (fully random) and 1 (fully synchronous). The fractal dimension gives the amount of entropy contained in the time series. The larger the value, the more entropy the time series contains.
When the natural frequency is distributed in [0, 1], < R >= 0.97 as given in Fig. 16(D), which is close to a full synchronization. Accordingly, the G-P dimension is only 1.45, indicating that the time series contains very little variability besides the nearly periodic oscillation, which makes it hard to distinguish different nodes. However, the entropy contained in the time series considerably increases if the frequency distribution is extended to the interval [0, 30], since the order parameter and the G-P dimension turn 0.38 and 5.2 respectively as plotted in Fig. 16(E), signalling little synchronization in the network. To see clearly the influence of heterogeneity induced by the frequency distribution [0, µ], we scan different values of µ and depict the results in Fig. 16(F). The order parameter of the system decreases while the normalized dimensionality gradually increases, which indicates the waning of the synchronization. As expected, the accuracy of the reconstruction is gradually increasing and approaching 100%.

Discussion
Reconstructing system dynamics and interaction topology is an important yet challenging problem, especially with only partial information of the system state and in the presence of noise. In this article, we proposed a new framework for the reconstruction based on minimization of a carefully designed cost functional. To proceed in the gradient-descent direction in a continuous manner, a new system of differential equations are derived which serve to drive both the unknown observables and parameters to the correct values with the help of given time series. Several examples including the coupled Lorenz system, the FHN neural networks, the coupled Kuramoto oscillators and a gene regulation network (Supplementary Fig. 20) are used to demonstrate validity of the current technique. Because of the evolution character, the method could be conveniently utilized to reconstruct 15/23 complex dynamics and equations with nonlinear parameter dependence, or to monitor system reconfiguration changes on the scene.
With small noise in the system, the method is still useful while a direct application may fail in the presence of strong noise. So, it would be very interesting and helpful to extend the current frame to treat systems contaminated with strong noise. One possible way is to suppress noise in the available data with various existing techniques, for example, by using Wang's method 43 to give a clean orbit first. Nevertheless, strong noise could even move the average orbit, which may be regarded as a renormalizing effect originating from the ignored degrees of freedom. It is desirable to utilize the framework here to renormalize complex dynamics with only slow degrees of freedom that are of our interest, treating those fast degrees as noise.
For small networks, a full connection assumption is made during the reconstruction and then the true network structure and connection strength are conveniently deduced with our method. But for large networks, prior knowledge of network topology is still needed since the number of connections grows quadratically with the network size while the available information increases only linearly. In practice, most networks are sparse and various techniques exist to infer possible connections. Even if the inferred links are spurious due to indirect interaction or confounding effects, our method still works. Nevertheless, it should be of great practical interest to extend the current framework to directly treat large-scale reconstruction without resorting to other inference programs beforehand.
In the current work, the form of equations of motion is assumed known and only unknown parameters need to be deduced. Compared to existing algorithms 30,44,68,69 , this is not a serious restriction since we can always extend the expansion basis on the right hand side of the equation to include possible new terms. The cost is that more parameters should be deduced during the reconstruction, which is not a problem if the number of parameters is not over large. Of course, it remains a big challenge to deduce the underlying laws of motion purely from observation without much prior assumption although some progress has been made with machine learning [70][71][72][73] . It would be very interesting if we could combine the current framework with these cutting-edge techniques.
If the given information is too little, it seems impossible to deduce much as expected in general. Synchronization effectively reduces the fractal dimensions of the time series and makes it hard to carry out reconstruction, which also brings a lot of interesting problems. To get out of synchronization, we may consider transient dynamics or supply noise to unfold the reduced dimension. If it is impossible to apply external perturbation, it may be interesting to design an effective model to describe the reduced dynamics as indicated above. If synchronization is partial as in most cases, it is essential to determine and measure typical nodes to enable an equivalent modelling. In this case, some nodes may contain more information than others, how to find these is also an intriguing problem.
In all, from the given information, how much we can deduce about the dynamics and structure of a system is an interesting problem both in theory and in application. We hope that the presented framework in the current paper will provide an alternative route for further investigation.

Recycling of data
In our reconstruction process, the amount of data needed is very large. For example, for the Lorentz system we need 50, 000 sampling points to recover the dynamics. What if there are not so many sampling points? Our solution is to recycle the measured data to generate longer orbits. As shown in Fig. 18, we only used 5000 data points for the reconstruction. It seems that all the unknown parameters converge to the right values but with jittering appearing every 5000 points. This is because the recycled orbit breaks every 5000 points and so the continuous evolution is restarted with a jump at this point. Nevertheless, the quick relaxation of parameter values before next jump is clearly seen, which indicates the robustness of the reconstruction and feasibility of recycling the data.

Error function ∆ in Eq. 3
Although the inference system evolves along the negative gradient, the error function ∆ defined in Eq. 3 does not always decrease since it is explicitly a time-dependent function. As shown in Fig. 19, at the initial stage, the error ∆ indeed fluctuates a lot. But after entering the basin of attraction of the parameters and the dynamics, the convergence dominates the time evolution so that the error suddenly drops to almost 0 as seen in the Figure

Gene regulation dynamics
Due to its evolution character, the current algorithm may treat dynamics with a nonlinear parameter dependence. To demonstrate this salient point, we use a simple gene regulatory network 74 .
where g i is the concentration of gene i , κ i denotes the Hill's coefficients, and a i ∈ [−0.3, −0.1]. Parameters A i, j denotes the regulation strength of gene j on gene i which is set to 0.1 and the network is plotted in Fig. 20(A). Given the time series of g i , we would like to deduce the connection A i j and the Hill's coefficients in the equation, which take three different values. To start the reconstruction, all the values of the parameters are initially set to 1 in the inference equation. The inference dynamics is depicted in Fig. 20(B). It is easy to see that the unknown parameters are quickly driven to the true values, although they appear as power indicies in Eq. (18). Hence, equipped with the current reconstruction technique, it seems possible to tackle intricate dynamics with nontrivial parameter dependence in real-world applications.

Lyaponov exponent vs the decay rate α
The choice of the decay rate α is closely related to the Lyapunov exponent. To compare Fig. 2 and 3, we increase the parameter r in the Lorentz system (10) from 28 to 64, and as a result the largest Lyapunov exponent increases from 0.8 to 1.5. If we still take the value 3 of the decay rate α as in Fig. 2, the inference fluctuates a lot with no perceivable convergence as depicted in Fig 21 (B). In fact, to ensure the convergence, a larger α = 7 is needed to counter the increased instability as shown in Fig 21  (A). In Fig. 22, a more comprehensive comparison is made at different decay rates for different Lyapunov exponents. The effective α tends to increase with the Lyapunov exponent, being consistent with the above observation. It is good to note that the increase is slow. There seems exist an interval such that the α in this interval is generally good for different α values.

FHN Synchronized dynamics
Synchronization generally makes the inference harder, which can be clearly seen here from a simple example. As shown in Fig. 23(A), the synchronization dynamics of the three FHN nodes are used in the reconstruction. When node 2 and 3 are fully

22/23
synchronized, it is impossible to distinguish them only from the time series of node 1 (Fig. 23(B) ). Therefore, in this case, the obtained coupling strength 5 is the average of the original two: 4 and 6. If they are not fully synchronized (by changing the parameter values), our method works but is slow to infer the correct coupling strength when close to synchronization ( Fig. 23(C).