Structural controllability of temporal networks

The control of complex systems is an ongoing challenge of complexity research. Recent advances using concepts of structural control deduce a wide range of control related properties from the network representation of complex systems. Here, we examine the controllability of systems for which the timescale of the dynamics we control and the timescale of changes in the network are comparable. We provide analytical and computational tools to study controllability based on temporal network characteristics. We apply these results to investigate the controllable subnetwork using a single input. For a generic class of model networks, we witness a phase transition depending upon the density of the interactions, describing the emergence of a giant controllable subspace. We show the existence of the two phases in real-world networks. Using randomization procedures, we find that the overall activity and the degree distribution of the underlying network are the main features influencing controllability.


Introduction
Complex systems consist of many interacting elements, and the web of these interactions is best described as a complex network. Therefore, studying the structure of such networks and exploring the consequences of their properties is essential to understand complexity. In the last two decades, a significant amount of research has been devoted to this problem [1][2][3][4], however, only limited progress has been made in describing how the network structure of a system influences our ability to control it [5][6][7][8]. Recent work by Liu et al spurred interest in network control [9]. They found that, if the system can be represented by a directed weighted network, assuming linear dynamics and invoking the framework of structured systems [10,11], it is possible to study control-related questions by only using information about the underlying network. This enabled the research community to apply the full arsenal of network science to the problem, uncovering various nontrivial phenomena emerging from the complexity of the structure of the system [12][13][14][15][16][17].
In this paper, we extend structural controllability to systems for which the timescale of the dynamics and the timescale of changes in the network topology are comparable [18]. In particular, it is necessary to take into account temporal information of the connections when the interaction events are not evenly distributed over time, but have nontrivial temporal correlations [19,20]. Such systems include communication, trade, or transportation networks [21][22][23][24][25]. Furthermore, the temporal sequence of interactions governs spreading processes [21,22]. Consider as an example a small communication network of three individuals A, B, and C ( figure 1(a)). Assume that B sends an email to C at time t = 1, and A sends an email to B at time t = 2. Neglecting the temporal sequence, we find that information may spread from A to C. However, taking the order of the messages into account, this is obviously not possible, which In the aggregated network, information can spread from node A to C, however, it is forbidden if the temporal order of the interactions is taken into account. As a consequence, we cannot control C by imposing a control signal on A. (b) We visualize the dynamics represented by the small temporal network by creating a copy of each node for each time step. The state of the node in the layer + t 1 is determined by its neighbors in layer t. We aim to control the system at t = 3 in Δ = t 3 time steps. We use D as an input node, meaning that we can intervene at D in layers = t 1, 2, 3. According to the independent path theorem, we can control the nodes in layer t = 3 that can be connected to intervention points via independent paths; therefore we control nodes B, C, and D. has a clear consequence for control: we cannot influence C using A. Therefore, one must include the temporal aspect of the interactions when studying the controllability of networks with timevarying topologies.

Structural controllability of temporal networks
We study directed temporal networks  , which are composed of a set of nodes consists of an ordered node pair and a time stamp, representing that the node v i interacts with node v j at time t, and w ij is the link weight. In a static network the weight sometimes represents the frequency of interaction; here it describes the strength of an individual interaction. For example, in a social network w ij may measure how much v j trusts v i . For our purposes, the detailed value of w ij does not have to be known; therefore we refer to link . We consider discrete time-varying linear dynamics [26] where the vector  ∈ t x( ) N represents the state variables, x i (t) corresponding to the state of node v i at time t. The first term describes the internal dynamics of the system, the matrix  ∈ × t A( ) N N is the transpose of the weighted adjacency matrix at time t. The second term describes the control applied to the system: if we impose an outside signal on node v i at time t changing the state of the node at time + t 1, we say that we intervene at node v i , and we call the Extending the standard definition of structural controllability to time-varying systems [10,27], we call a subset of nodes ⊆ C V a structural controllable subspace at target time t in Δt time steps, if there exists a pair of t A * ( ) and t B * ( ) that has the same structure as t A( ) and t B( ), such that the state of all nodes ∈ v C can be driven from any initial state to any final state at time t, in at most Δt time steps by appropriately choosing u(t). By the same structure we mean that the zero entries in t A( ) and t A * ( ) are in the same places and only the value of the nonzero elements can be different, i.e. the links connect the same nodes in the corresponding network, and only the weights can be different. It is worth noting that explicitly including Δt in the definition allows us to study the time necessary to achieve control, an aspect that has not been explored yet.
The power of the structural controllability approach arises from the fact that it does not require detailed information about the strength of the interactions, allowing us to characterize controllability by considering network properties only. Yet, structural controllability is a general property, in the sense that if a system is structurally controllable, it is controllable for almost all weight configurations [10,11].
We prove the independent path theorem in appendix A. The theorem states that C is a structurally controllable subspace, if all nodes ∈ v C at time t are connected to intervention points through independent time-respecting paths of length of at most Δ − t 1. A time-respecting path is a sequence of adjacent temporal links such that subsequent links in the path are active in subsequent time steps, e.g. the link v v t ( , , ) i j may be followed by . Two paths are independent if they do not pass the same node at the same time. For a small example see figure 1.
The independent path theorem allows us to formulate control related questions. Here we focus on the problem of identifying the maximum controllable subspace Δ N v t t ( , , ) C using a single input node v, i.e. we allow interventions at points (v, s) for any , we have to find the maximum number of independent paths starting from possible intervention points and ending at time t, i.e. ending at points (w, t) for any ∈ w V. Identifying the independent paths can be done efficiently using the Ford-Fulkerson algorithm as explained in appendix A.6. We characterize the overall controllability of a temporal network by the average maximum controllable subspace

Analytical results for model networks
We provide an analytical solution for a simple class of model networks to gain insight on the effect of the degree distribution and the choice of Δt. To create the network, we generate an uncorrelated static network for each time step with prescribed in-and out-degree distributions p k ( ) in and p k ( ) out , respectively. Each time step is generated independently; only the degree distributions are kept the same. Therefore, consecutive time steps are completely uncorrelated. Since all time steps are statistically equivalent, the maximum controllable subspace does not depend on t, i.e.
. We study networks with Poisson (Erdős-Rényi networks) and scale-free degree distributions, the latter meaning that Consider first the case of only one intervention point (v, s). We can use this intervention to control one of the accessible nodes at later time steps, i.e. any node that can be reached via a path originating from (v, s). The cluster of accessible nodes can be described as generated by the Galton-Watson branching process [45]: node v at time s has k out offspring and k out is drawn from the distribution p k ( ) out ; each of these offspring have k out out-neighbors also drawn independently from p k ( ) out , and so forth. The Galton-Watson process undergoes a phase transition depending on the average degree: in the subcritical phase 〈 〉 < k 1, it will terminate in finite steps, reaching only a finite number of nodes. In the supercritical phase 〈 〉 > k 1, the branching process may continue forever, spanning a finite fraction of the network. We will show in the following that the existence of infinite long paths fundamentally changes the controllability of the system.
In the subcritical regime, we find that out out . Finite size, however, can obscure the difference between the two network classes by introducing a cutoff in the degree distribution.
Above the critical point 〈 〉 > k 1, the maximum path length is no longer a limitation due to the formation of a giant component. Consequently, choosing large Δt, i.e. Δ Δτ = t N (Δτ > 0), we can control a finite fraction of the network . For small Δτ, any infinite path starting from an intervention point can be used for control. Therefore, C out where S out is the probability that an intervention point is a root of an infinite path, which is determined by a self-consistent equation. Similarly to the subcritical regime, the solution only depends upon p k ( ) out . Examining (4), one might think that by allowing sufficiently large Δτ, we can control the entire network. However, above a characteristic Δτ *, a new limitation arises, and Δτ n ( ) C saturates (figure 2(c)). The number of controlled nodes will be equal to the maximum number of independent infinite paths, i.e. infinite paths that do not pass the same node at the same time step. We analytically approximate ∞ n ( ) C using the framework developed to study core percolation and maximum matching in appendix B [29,30]. We find that ∞ n ( ) C depends on both p k ( ) in and p k ( ) out , and it is symmetric to swap the two distributions: it does not matter which direction we follow the paths, the number of independent paths remains the same.
Comparing the Poisson and scale-free distributions, we find that the Poisson distributed networks are easier to control both below and above the saturation point Δτ *, in line with our observation in the subcritical regime (figures 2(b) and (d)).

Temporal controllability of a real system
Digital traces of communication make it possible to apply the developed tools to explore the controllability of real systems. Linear dynamics may provide a simple model for opinion formation: the opinion of each individual is represented by a continuous variable, and it is reevaluated in each time step depending on the information received. The information is taken into account with different weights depending on the source [31,32].
Here, we study a temporal network representing the email communication of a mid-size company [48]. The dataset contains the sender, the recipient and the time each email has been sent. All together there are 82 927 emails between 167 employees covering a 9 month period. The necessary temporal resolution of the network depends on the timescale of the dynamical process we aim to control. To highlight different features of the dataset we use two different temporal resolutions with one hour and one day time steps. The first corresponds to a short-term control scenario, influencing the dynamics within a workday, while the second case assumes a slower change, spanning the whole available period. The dataset features strong daily and weekly patterns: the bulk of the email traffic happens during a 9 hour period of each workday. Therefore, for the short-term control case we average the results for workdays only, and for the long-term case we remove the weekends and holidays.
The average degree of the network in one time step depends on the time resolution. For the one hour time steps we find 〈 〉 ≈ k 0.23 h , predicting that the system is in the subcritical phase. Indeed, we find that Δ N t t ( , ) C remains of the order of few nodes, and it saturates in just a few steps in accordance with our findings for model networks ( figure 3(a)). For the one day time step, we obtain 〈 〉 ≈ k 1.76 d , putting the system in the supercritical regime. We expect the saturation time to be in the order of the system size; therefore to minimize the effect of boundary conditions, we only consider control at the end of the last available workday, at target time t = 190. We find that in the beginning increases approximately linearly with Δt, and for larger Δt it seems to saturate, although slower than in the case of the model networks ( figure 3(b)).
Next, we use various randomization processes to separate the effects of temporal patterns and the underlying network. We find that fluctuations in the average degree of the time steps decreases Δ N t t ( , ) C : a drop in the average degree acts as a bottleneck, letting through fewer i.e. the average maximum controllable subspace as a function of the target time t in the Δ → ∞ t limit. The data points are the average of 189 workdays covered by the dataset. The bulk of the email traffic happens during the working hours. Therefore, we restrict ourselves to the regular office hours spanning the period ⩽ ⩽ t 9 17 when randomizing the time of the links. Completely removing temporal patterns by assigning random times to the links (RT), and completely removing the network structure by randomly placing the links within a time Shuffling the time steps (ST) only slightly increases, and randomizing the links while keeping the degree sequence (DPN) slightly decreases ∞ N t ( , ) C . This shows that the controllability is mainly determined by the degree distribution and the overall activity pattern, and that the correlations have a smaller impact. (b) In the second scenario, we aim to control the system on a longer timescale; we chose 1 workday as a unit of time. In this case the average degree within a time step is 〈 〉 ≈ > k 1.76 1 d , predicting that the system is above the critical point, and that the characteristic time to control the system will be in the order of the system size. Therefore, we cannot make multiple independent measurements, and we will focus on control at the end of the last workday at t = 190. We show the Δ = N t t ( 190, ) C in function of Δt. We observe linear growth for small Δt and saturation for large Δt, although the saturation does not happen completely in the available time period. Randomizations yield similar conclusions as in the short-term control scenario. independent paths. Indeed, removing the fluctuations by assigning random times to the links significantly increases Δ N t t ( , ) C (RT curve in figure 3). Next, we shuffle the order of all the time steps and leave the network structure within a time step unchanged. This way we keep the overall fluctuations in the average degree, but we eliminate the correlations between subsequent time steps. We find that Δ N t t ( , ) C slightly decreases, suggesting that temporal correlations enhance the number of available paths, such as casual chain of events (ST curve). To investigate the effect of the underlying network, we keep the temporal information, and we only randomize the network within a time step. First, we completely mix the connections, thereby transforming the degree distribution to a Poisson distribution with the same average degree. The controllability of the resulting network dramatically increases, showing that the existence of hubs makes control difficult (RN curve). In the next randomization, we keep the degree of each node in each time step, but we eliminate all other correlations by cutting all links and randomly rewiring them. We find that the controllability of such networks is very close to the original, meaning that the degree sequence of the nodes is the main factor in determining controllability, and that correlations are only secondary (DPN curve). For details on the dataset and the randomization procedures see appendix C.

Discussion
Both structural controllability and temporal networks proved to be a useful tool in understanding complex systems, generating a high amount of research in their respective fields. We have established the connection between the two, opening an array of new questions.
For instance, the effect of self-interactions [34,35]. Self-loops can be used to model memory effects, meaning that a node remembers incoming signals for a given time. The tools provided in this paper are applicable regardless of the presence of self-loops. However, when analyzing model networks and empirical data, we made the choice not to add self-interactions explicitly.
Another important question is the problem of control energy: it has been found for static networks that even if the system is controllable, in some cases the amount of energy necessary to control the system can be very large [13]. It would be interesting to know how control energy depends on the temporal characteristics of the system, and how the intervention points should be chosen to minimize this energy.
Further possible research includes identification of the minimum set of input nodes necessary for complete control, or characterizing the role of individual nodes.
Our work focused on network effects by assuming linear dynamics; however, nonlinear dynamics are an important part of complexity. Therefore developing nonlinear control methods for complex networks is of utmost interest [36].
Temporal networks are an important ingredient in developing control methods for complex systems. Our work contributes to uncovering this rich and unexplored field and we hope that it will serve as a starting point for future investigations in this emerging area of research.

Acknowledgments
This project was supported by German Academic Exchange Service (DAAD) via a scholarship granted to MP, and PH acknowledges support by BMBF (grant no. 01Q1001B) in the framework of BCCN Berlin.
consists of an ordered node pair and a time stamp, representing that there is a link pointing from node v i to node v j at time t with weight w ij . We measure the time in discrete steps = t 0, 1, 2 ..., and the choice of the unit may depend on the resolution of the available dataset or modelling purposes. For our purposes, the detailed value of w ij does not have to be known, therefore we refer to link . A temporal path P connecting node v i and v j from t 0 to t 1 is a sequence of consecutive temporal links such that the first link originates from node v i at time + t 1 0 , and the last link in the sequence points at node v j at time t 1 . The path consists of Δ = − t t t 1 0 consecutive links and Δ + t 1 nodes. A node by itself is a path of length 0. Two paths are independent if they do not pass through the same node at the same time. For a small example see figure A1 (a).
Note: this definition of time-respecting path does not exclude waiting at nodes; waiting can be realized by adding self-loops. In the static representation of the network, information can spread from node A to node C, however, due to the temporal sequence of the interactions, this is not possible. There is a time-respecting path from D to B (consisting of links D A ( , , 1) and A B ( , , 2)), and from D to C (consisting of link D C ( , , 2)). The two paths do not pass the same node at the same time; therefore they are independent. (b) The layered network representation . We make a copy v i t ( , ) of each node v i for each time step ∈ t t t ( , ] 0 1 . We connect the two nodes v i t ( , ) and

A.2. Layered network representation
It will be useful to represent the temporal network defined above as a layered network  t t ( , ) 0 1 consisting of a set of nodes V and a set of static links Ê. We make a copy figure A1(b)). Therefore, the layered representation is a static directed acyclic network with = − V V t t |ˆ| | |( ) 1 0 nodes. As a consequence, temporal paths appear as static paths in the layered representation, and independent temporal paths are simply node-disjoint paths.

A.3. Dynamics
We study discrete time linear dynamics [26] If we intervene at a node at any time, the node is referred to as an input.
Note 1: the state i of node v i is completely determined by the state of its inneighbors at time t. If we assume that is not independent from x i (t), we have to add self-interactions (e.g. diagonal entries in t A( )). Information about self-interactions is not always explicitly provided in network datasets.
Note 2: consider the case when there are no links pointing at v i at t, that is = a t ( ) 0 ij for all j. If we did not add self-interactions, then In some cases, we might assume that if a node does not have incoming links, it retains its state. This can be taken into account by adding a self-interaction at time t only if a node has no incoming links.

A.4. Controllability
Controllable: we call the system We define the temporal contollability matrix [26,27] It is now clear that the linear rank of t t C( , ) 0 1 is the number of variables that can be set independently by the proper choice of u, that is In most cases, however, the strength of the interactions, i.e. the link weights, are not known completely. Fortunately, a lot of information about the controllability of a system can be deduced only from the zero-nonzero structure of t t A B ( ( ), ( )), i.e. the existence or absence of links, using the structural controllability framework [10]. We treat the nonzero elements of t t A B ( ( ), ( )) as free parameters, and only keep the zero elements fixed.
Structurally controllable: we call the system t t A B ( ( ), ( )) structurally controllable at target time t in Δt time steps, if we can set the free parameters of t t A B ( ( ), ( )) such that the system is controllable in the original sense.
Note: if a system is structurally controllable, it is controllable for almost all weight configurations. And if it is not, it can be made controllable with an arbitrarily small perturbation of the weights [10].
Controllable subspace: we call the subset of state variables ⊆ C V a controllable subspace at target time t 1 in Δt time steps, if the state variables ∈ x C i can be driven to any final state at target time t 1 from any initial state in at most Δ = − t t t Proof. We reduce the time-dependent controllability problem to a larger time-independent problem We construct the linear time-independent system the following way. We create , ] 1 1 . Note that we use the index pair (i, t) to identify the elements of vector x . We construct Â by setting , and b i t j ( , ); is b ij (t). The network representation of the time-independent system is equivalent to the layered graph representation of the temporal network  .
We can check by simple multiplication that for all i. Therefore for every C controllable subspace of the system 1 . It has been previously shown [11,37] that a subspace Ĉ of a static network is structurally controllable, if there exists a stem-cycle disjoint subgraph that contains all nodes in Ĉ . A stem is a path starting from a node that is directly coupled to an input signal; in our case these are the intervention points. A stem-cycle disjoint subgraph is a subgraph composed of stems and cycles, such that all nodes are contained by exactly one stem or one cycle. The network representation of the time-independent system is acyclic, hence a stem-cycle disjoint subgraph in our case is simply a set of independent paths. Therefore, C is a structurally controllable subspace of  figure A1(a). We aim to control the system at target time t = 3 in Δ = t 3 time steps, and we use node D as an input node. We compute x(3) by successfully applying (A.1). (b) We convert the linear time-varying dynamics to a larger time-independent system the following way. We create , ] 1 1 . We construct Â by setting + â i t j t ( , );( , 1) to a ij (t); all other elements of Â are set to 0. The input nodes D ( , 1), D ( , 2) and D ( , 3) correspond to the intervention points in the time-varying system. The state vector of the time-independent system t x ( ) at time t = 3 is equal to the state vector of the time-varying system x(t). (c) The corresponding network of the time-independent system is equivalent to the layered graph representation  (0, 3) of the temporal network  . According to the independent path theorem, we can control nodes B, C, and D. independent paths starting from intervention points and leading to each node ∈ v C i at time t 1 . □ For a small example see figure A2 .
A.6. Maximum controllable subspace problem Given a temporal network  , we select a set of nodes ⊆ D V to be inputs, meaning that we allow interventions at these nodes. We explore the problem of determining the dimension of the maximum controllable subspace , where t 1 is the target time, and Δt is the number of time steps we use to reach the desired state. We use the layered representation ) and the set of potential target nodes is 1 . A controllable subspace is given by a subset of T for which all nodes can be reached via independent paths from potential intervention points. Therefore, the dimension of the maximum controllable subspace is the maximum number of independent paths originating from I and terminating in T.
The problem of finding the maximum number of independent paths in directed networks is equivalent to solving the maximum flow problem. If the nodes in set I are sources, the nodes in T are sinks, and if the capacity of each link and node is set to 1, the maximum flow is equal to the maximum number of independent paths. This problem can be solved in polynomial time, e.g. using the Ford-Fulkerson algorithm with complexity O E N (|ˆ| · ) C [38,39]. We study a simple uncorrelated temporal network model that can be considered as the temporal counterpart of the static hidden parameter model [40,41]. We start with N unconnected nodes, and for each time step we generate a directed network independently. Each node v i is assigned two hidden parameters w i ( ) in and w i ( ) out . We then randomly place L directed links by choosing the start-and endpoint of the link with probability proportional to w i ( ) in and w i ( ) out , respectively. By properly choosing the hidden parameters, we can tune the degree distributions p k ( ) in and p k ( ) out . Throughout the paper, we investigate networks with Poisson [42] and scalefree distribution [43], the latter meaning that the distribution has a power-law tail.
The degree distribution of the model is given by which is valid for both in-and out-degree. The corresponding generating function of this distribution is for all nodes. This way the probability of connecting any node pair is equal, and we recover the classic Erdős-Rényi model. The corresponding generating function is To generate networks with scale-free degree distribution we use the so-called static model [44]. We set the hidden parameters of node i to The weights are then shuffled to eliminate any correlations. For large N, this choice yields the degree distribution k is equal to the average degree, and Γ n x ( , ) is the upper incomplete gamma function. For large k, ( 1 1 ) , where γ α = + 1 1 in out in out determines the exponent of the tail of the distribution. The corresponding generating function in the → ∞ N limit is is the exponential integral function. However, for scale-free networks we often find that finite size effects are not negligible for system sizes accessible for simulation. In these cases we have to take the finite size into account by using (B.2) explicitly.

B.2. Percolation in the temporal network model
Let us consider the case when we have only one intervention point v i t ( , ) in the layered network. We can use this intervention to control one of the accessible nodes in a lower layer, i.e. any node that can be reached via a path originating from v i t ( , ) . The cluster of accessible nodes can be described as a cluster generated by the Galton-Watson branching process [45]: the node v i t ( , ) has k out out-neighbors, where k out is drawn from the distribution p k ( ) out . Each of these outneighbors will have k out out-neighbors also drawn from p k ( ) out , and so forth. We study the process in the → ∞ N limit and denote the probability that the branching process continues forever by S out . We can calculate S out using the self-consistent equation The equation simply means that the probability that the branching process rooted at node v i t out meaning that the critical point is simply determined by the average degree independent from other parameters of the degree distribution. Nodes that are roots of infinite trees form the giant out-component.
In the subcritical phase (〈 〉 < k 1) the branching process will halt in finite steps, meaning that only finite number of nodes can be accessed. In the critical point (〈 〉 = k 1) the size of the largest cluster diverges; however, the relative size is still zero. In the supercritical phase (〈 〉 > k 1) the branching process will continue forever with probability S out . Similarly, we calculate the probability that a randomly selected node is an offspring of an infinite cluster: Nodes that are offspring of infinite trees form the giant in-component. We refer to the intersection of the giant in-and out-component as the giant component.

B.3. N C (Δt ) in the subcritical phase
The goal of this section is to determine the average Δ N t ( ) C using a randomly selected node v i as input. We start with the observation that if the network is uncorrelated, each intervention point can be treated as independently and randomly selected. In the subcritical phase the size of the accessible cluster rooted at a random node is finite. Therefore, the probability that two such clusters rooted at two randomly selected nodes overlap is 0. The probability that an intervention at v i t ( , ) can be used to control a node at the target time is equal to the probability that a sufficiently long path is rooted at the intervention point. Hence, we first determine the the cumulative distribution function of the maximum path length originating from a randomly selected point, i.e. P(d) is the probability that the maximum length path originating from a node is at most d. The maximum path length from node v i t ( , ) is one larger than the maximum path length originating from its out-neighbors. Averaging over p k ( ) out we get We can solve the equation recursively starting from out . Since the model is invariant to time shifts, we can set the control target time to Δt without loss of generality. The probability that an intervention at time t can be used is given by We gain further insight by studying the asymptotic solution of (B.9) for models with Poisson and power-law degree distribution.
For the Poisson case p k ( ) out (or any distribution with finite variance σ out 2 ), we can expand the generating function G x ( ) out around x = 1:  From this it follows that for large Δt, we get Δ Δ ∼ N t t ( ) log c , meaning that increasing Δt will increase the number of nodes that we control. However, the fraction of the network that is controlled still remains 0 in the large network limit.
For scale-free networks with γ < 3, the σ out 2 is infinite, and therefore the simple Taylor series expansion of the generating function in (B.14) is not sufficient. To understand the effect of a power-law distribution, we transform the generating function provided in (B.5): Using this in (B.9) and only keeping the first two terms, we get 3, or equivalently α < < 1 2 1. If = 〈 〉 < c k 1, the asymptotic behavior of the solution is determined by the second term, and we obtain the same solution as (B.13). For the solution in the critical point 〈 〉 = k 1, we keep the third term, and we find For γ < < 2 3 this means that even in the critical point ∞ N ( ) c will remain finite. However, Δ N t ( ) c will approach its stationary value slowly, that is Consider the case when v i is the input node, and v i t ( , ) and v i s ( , ) are two intervention points such that both are roots of infinite trees. Since these trees cover a finite fraction of the network, we can no longer assume that the overlap of accessible clusters has zero probability. However, being the root of an infinite tree also means that we can reach a finite fraction S out of the nodes in the target layer, and we can choose from many possible paths. Therefore, for small Δτ, we assume that whenever an intervention point is a root of an infinite tree, we can use that intervention point to control one node in the target layer. This means that The in-degree distribution is determined similarly.
To calculate a first approximation ∞ n ( ) C (1) , we determine the maximum number of independent paths in the giant component connecting two subsequent layers t = 1 and t = 2, which is equivalent to finding the maximum matching in a bipartite network formed by the two layers. A matching in a network is defined as a set of links that do not share endpoints; therefore in the case of the network of two layers, the links in the matching are independent paths of length one. A node is called matched if it is adjacent to a link in the matching. This way . For uncorrelated networks the size of the maximum matching can be determined analytically; we provide the detailed calculation in section B.5. This approximation yields an upper bound for ∞ n ( ) C (figure B2), because it assumes that we can choose the endpoints of the paths in layer t = 1, and the starting points of the paths in layer t = 2 arbitrarily. However, when constructing a maximum matching we do not have such freedom: some nodes always have to be matched [15], and in other cases some nodes cannot be included at the same time. For a small example see figure B3.
For the next approximation ∞ n ( ) C (2) , we consider three subsequent layers = t 0, 1, 2. Each layer contains S S N in out nodes, and has degree distributions p k ( ) out and p k ( ) in . First, we examine the maximum matching between layers t = 0 and t = 1, and we determine the set of nodes A in Figure B2. Comparing the approximations. We plot the average controllable fraction of the network in function of the average degree for Poisson (ER) and scale-free (SF) degree distributions. For networks of size N = 1 000, each data point is an average of 1 000 input node measurements. The dashed line is the first approximation ∞ n ( ) C (1) , and the solid line is the second analytical approximation ∞ n ( ) C (2) . The small break in ∞ n ( ) C (2) for the Erdős-Rényi network is a consequence of the core percolation transition [29]; the transition point for scale-free networks is not shown on the plot. layer t = 1 that are matched in all possible maximum matchings. In the first approximation, these nodes will always be endpoints of independent paths. However, if we cannot match them in the next layer, they will become dead ends. Therefore, the number of nodes in A that cannot be matched at the same time will be the next correction to ∞ n ( ) C (1) (figure B4 ). To calculate the correction, we find the maximum matching in the bipartite network formed by nodes in A in layer t = 0, and all nodes in layer t = 1. The degree distribution of nodes in A is p k ( ) out , and the degree distribution of nodes in layer t = 2 can be calculated by randomly removing − A S S N 1 | | in out fraction of links from p k ( ) in , similarly to (B.20).The number of nodes in A is determined using the equations developed in [15].
Similar correction can be computed for the set of nodes B in layer t = 1 that are always matched from layer t = 2, but cannot be matched at the same time from layer t = 0. We find that ∞ n ( ) C (2) approximates the numerical simulations well (figure B2). Note: in [15], it was shown that for dense networks above the core percolation threshold, the number of nodes that are always matched can be drastically different depending on specific realization of the network model, e.g. two Erdős-Rényi networks generated with the same parameters can be different. This is due to a special case when a finite fraction of nodes are 'almost always' matched, meaning that we have a set of nodes A such that in each possible matching only a finite number of nodes in A are not matched. Therefore, for our purposes these nodes can be treated as always matched.  Examining the maximum matching in layers t = 0 and t = 1, we find that the red nodes are matched in all possible maximum matchings. However, in the next layer the red nodes are connected to the same node, hence they cannot be matched at the same time. Therefore, the two red nodes can only be used in one independent path. Counting such configurations provides the second approximation.

B.5. Matching in bipartite networks
In this section we calculate the relative size of the maximum matching in uncorrelated bipartite networks with arbitrary degree distribution. Let  be a bipartite network, with two sets of nodes We use the formalism developed for core percolation [29]. Core percolation describes the sudden emergence of the core in random networks [30,46,47]. To define the core, we first introduce the greedy leaf removal (GLR) process: we select a leaf randomly (a node with degree 1), and remove that node and its neighbor together with all links adjacent to that neighbor, we repeat this step until no leaves are left; we then remove all isolated nodes. The core is defined as the remainder of the network after the GLR.
Analytic description is possible by introducing the following node categories: (i) αremovable, nodes that can become isolated during the GLR; (ii) β-removable, nodes that can be removed as a neighbor of a leaf during the GLR. We define α + as the probability that following a random link to the upper side we find a node that is α-removable in the absence of the link. We define α − , β + , and β − similarly. These probabilities are determined by a set of selfconsistent equations: 1 is the generating function of the excessive degree distribution.
The GLR process can be used to construct a maximum matching in the class of bipartite networks that we study here. We remove a leaf that consist of node v 1 with degree 1, and node v 2 with a possibly higher degree. To construct the maximum matching, we add link v v ( , ) 1 2 to the matching. Now all links adjacent to v 2 are not allowed in the matching, and therefore we remove them, too. We can continue until we have no leaves left, i.e. we are left with the core. It was shown that in large non-bipartite random networks the core can be asymptotically matched, i.e. the probability of randomly choosing an unmatched node is zero [30]. However, in bipartite networks there is another limiting factor: if the size of the core is different on the two sides, the size of the matching in the core cannot be larger then the smaller side.
Note: this can also happen if = + − N N , but ≠ + − p k p k ( ) ( ), and this limitation was not considered in [9] and in the subsequent [14]. Therefore, their results should be cautiously applied to networks above the core percolation with asymmetric degree distributions.
As stated above, ± m is the sum of the contribution of the leaf removal and the core. We first calculate the contribution of leaf removal. For each leaf removal, we add one link to the matching, increasing the number of matched nodes by 2, one on both sides. For each β-node there is one leaf removal. Therefore, to calculate the contribution of the leaf removal, we count the β-nodes on both sides: However, by doing this we have double counted the case when two β-nodes are removed together. This can only happen, if in the absence of the link connecting the two nodes, both nodes are α-nodes, the probability of this event is α α + − for each link. Therefore, the overall contribution is To determine the contribution of the core, we calculate the size of the core on both sides: and select the smaller side. Therefore, all together we have We study a publicly available temporal network representing the email communication of a mid-sized company [48,49]. The dataset contains the sender, the recipient, and the time each email has been sent. All together there are 82 927 emails between 167 employees covering a 9 month period. The necessary temporal resolution of the network depends on the timescale of the dynamical process we aim to control. To highlight different features of the dataset, we use two different temporal resolutions with one hour and one day time steps. To obtain these networks, we perform a coarse graining procedure: for each time step ⩽ < t t t 0 1 we create an aggregated network, i.e. we connect node v i to v j in the coarse grained network, if at least one email has been sent from v i to v j between t 0 and t 1 .
The one hour coarse grained network corresponds to a scenario in which we aim to influence the dynamics within a day. An important feature of the dataset is that it follows strong daily and weekly patterns. The bulk of the email traffic happens during the 9 hour period corresponding to regular office hours on workdays (figures C1(a) and (b)). The average degree of the network outside working hours is approximately 0, while during office hours 〈 〉 ≈ k 0. 23. h This means that control on the hourly timescale is only possible within one day, that is, each day can be considered separately. We find that the average degree distribution is highly heterogeneous (figures C1(c) and (d)), and the second moment (〈 〉 ≈ k 0.99 ) is much larger than the second moment of a Poisson distribution with the same average degree (〈 〉 ≈ k 0.28

ER
). By choosing one day time steps we assume slower dynamics on the network. The coarse graining removes the daily activity patterns. To study control spanning over multiple weeks, we explicitly remove weekends and holidays, i.e. we measure the time in workdays. The average degree within a time step is 〈 〉 ≈ k 1.76 d ( figure C2(a)), which predicts that the system is in the supercritical phase, meaning that the characteristic control time is in the order of the system size. Therefore, the length of the available time period does not allow multiple independent measurements of the control process, hence we focus on controlling the system at the end of the last workday at t = 190. Similarly to the one hour case, the average degree distribution within a time step is heterogeneous (figures C2(b) and (c)), with second moments 〈 〉 ≈ k 18.07

C.2. Randomization procedures
We use four different randomization techniques to identify which temporal or network characteristics of the system influence controllability. For details on randomizing temporal networks see [18] and references within. Random time (RT): this randomization assigns random time steps to each link, thereby removing all temporal correlations, both overall fluctuations in the average degree, and local correlations such as consequent and simultaneous events (figures C1(a) and C2(a)). This randomization does not change who interacts with whom, that is, it does not change the aggregated network. However, by separating simultaneous events, the randomization changes the degree distribution within a time step indirectly (figures C1(c) and (d) and C2(b) and (c)). For the one hour coarse grained network, we only randomize within the working hours of each workday.
Shuffled time (ST): we shuffle the order of all of the time steps. This removes all correlations between subsequent time steps, such as casual chain of events, while the structure within a time steps remains unchanged (figures C1(b) and C2(a)). For the one hour coarse grained network, we only shuffle the time steps within the working hours of each workday.
Random network (RN): in this randomization, the network for each time step is replaced by an Erdős-Rényi network with the same number of links, thereby removing all network structure, including the heterogeneity from the degree distribution (figures C1(c) and (d) and C2(b) and (c)). All interaction times are retained, preserving the fluctuations in the average degree.
Degree preserved network (DPN): for this randomization, we break all connections, and randomly rewire them within a time step. This way only the degree distribution is preserved, but all other correlations in the network structure are eliminated. Similarly to RN, we do not change the interaction times.