Computing Betweenness Centrality in Link Streams

Betweeness centrality is one of the most important concepts in graph analysis. It was recently extended to link streams, a graph generalization where links arrive over time. However, its computation raises non-trivial issues, due in particular to the fact that time is considered as continuous. We provide here the first algorithms to compute this generalized betweenness centrality, as well as several companion algorithms that have their own interest. They work in polynomial time and space, we illustrate them on typical examples, and we provide an implementation.


Introduction
Betweenness centrality, or betweenness for short, is one of the most classical and important concepts defined over graphs and used in the field of complex networks and social network analysis [36,30,20,19,9].Given a graph G = (V, E), it measures how frequently each node v ∈ V is involved in shortest paths: B(v) = u∈V,w∈V σ(u,w,v) σ(u,w) where σ (u,w,v)  σ(u,w) is the fraction of all shortest paths from u to w that involve v if there is a path from u to w, 0 otherwise.Reference algorithms compute the betweenness of all nodes in a graph in time O(n • m), where n and m are the number of nodes and links in the graph [4].
Betweenness was extended recently to link streams [18], a family of formal objects that model sequences of interactions over time in a way similar to the modeling of relations by graphs.They are equivalent to other objects like time-varying graphs (TVG) [8,2], relational event models (REM) [7,25], or temporal networks [21,14], with an emphasis on the streaming nature of link sequences.Various temporal extensions of beweenness were introduced in these contexts, see Section 7. Betweenness in link streams has some unique features that make it quite different from other temporal extensions of betweenness in graphs.In particular, it considers continuous time and links with or without durations: nodes may be linked at specific time instants, as well as during continuous periods of time.Also, it considers paths from any node at any time instant to any node at any time instant, which induces an uncountable amount of temporal nodes.This raises specific algorithmic challenges, that we address in this paper, thus obtaining the first algorithm (and implementation) for computing betweenness centrality in link streams.
We first introduce key concepts and notations in Section 2. We then show that betweenness computations involve uncountable sets of paths with a finite volume, that we define and compute in Section 3. In addition, it involves integrals that must be tranformed into discrete sums over a finite number of time intervals.We define and compute these intervals in Section 4, and combine them in Section 5 to obtain the contribution of any pair of nodes to the betweenness of a given temporal node.We finally compute the betweenness of any temporal node in polynomial time and space, and show results on non-trivial toy examples in Section 6.We provide an open Python implementation of these algorithms [16].

Preliminaries
A link stream L is a triplet (T, V, E) where T = [α, ω] is an interval of R representing time, V is a finite set of nodes, and E ⊆ T × V ⊗ V is the set of links2 .Then, (t, uv) ∈ E means that u and v are linked together at time t.For any u and v in V , T uv = {t, (t, uv) ∈ E} denotes the set of time instants at which u and v are linked together.See Figure 1 for an illustration and [18] for a full presentation of the formalism.
We assume here that T uv is the union of a finite number of disjoint closed intervals (possibly singletons) of T .We denote by T the set of bounds of maximal intervals in T uv for any u and v, that we call event times.We denote by m uv the number of maximal intervals in T uv , and by m = u,v∈V m uv their sum, i.e. the number of maximal intervals in E. In the case of Figure 1, we obtain T = {1, 2, 3,5,6,7,8,9,11,12,14,15,16,18,19,22,23,24,25,27,28,29,30, 31}, m ab = 3, m ac = 1, m bc = 4, m bd = 1, m cd = 3, m de = 4, and so m = 16.
If such a path exists, then (y, v) is reachable from (x, u), which we denote by (x, u) −→ (y, v).The path P involves (t 1 , u), (t k , v), and (t, v i ) for all t ∈ [t i , t i+1 ] and all i.It starts at t 1 , arrives at t k , has length k and duration t k − t 1 .A path with duration 0 is called an instantaneous path.
The path P is a shortest path from (x, u) to (y, v) if it has minimal length, called the distance from (x, u) to (y, v) and denoted by d((x, u), (y, v)).The path P is a fastest path from (x, u) to (y, v) if it has minimal duration, called the latency from (x, u) to (y, v) and denoted by ℓ((x, u), (y, v)).The path P is a shortest fastest path from (x, u) to (y, v) if it is a path of minimum length among those of minimal duration from (x, u) to (y, v).
For instance, in the case of Figure 1, the path a, 2, b, 4, c, 6, d, 9, e is a fastest path from (0, a) to (32, e), but a, 1, b, 4, c, 6, d, 9, e is not (it has duration 8).The path a, 2, b, 4, c, 6, d, 9, e has length 4 and duration 7, and no path from (0, a) to (32, e) with lower duration exists.It is not a shortest path since a, 9, c, 18, d, 23, e also is a path from (0, a) to (32, e) which has length 3 and duration 14.This last path is a shortest path, since no path with lower length exists, but not a fastest one.The distance from (0, a) to (32, e) therefore is 3 and the latency is 7.Among the fastest paths from (0, a) to (32, e), i.e. the paths of duration 7, the shortest have length 4. Therefore, a, 2, b, 4, c, 6, d, 9, e is a shortest fastest path between them, as well as a, 2, b, 5, c, 6, d, 9, e, for instance.
Finally, the betweenness of a node v ∈ V at a time instant t ∈ T measures how frequently (t, v) is involved in shortest fastest paths in L, see [18]: u∈V,w∈V i∈T,j∈T σ((i, u), (j, w), (t, v)) σ((i, u), (j, w)) di dj where σ((i,u),(j,w),(t,v)) σ((i,u),(j,w)) is the fraction of all shortest fastest paths from u at time i to w at time j that involve v at time t if there is a path from (i, u) to (j, w), 0 otherwise.
In this original definition, the quantity σ((i,u),(j,w),(t,v)) σ((i,u),(j,w)) is only loosely defined as a fraction of shortest fastest paths; the function σ itself, as well as the ratio between its values, are not explicitely defined.We will see in next section that this fraction involves uncountable sets of shortest fastest paths that have finite volumes with a size and a dimension.We will also introduce the appropriate arithmetic operators needed to deal with them, and an algorithm to compute these volumes.

Volumes of shortest paths
Let us consider a link stream L = (T, V, E), and a sequence I 1 , I 2 , • • • , I k of intervals of T .Let us denote by b i and e i the bounds of interval We say that the sequence I 1 , I 2 , • • • , I k is a sliding sequence if for all i, there exists no element in I i+1 strictly smaller than all elements of I i (∄y ∈ I i+1 , ∀x ∈ I i , y < x), and no element of I i strictly larger than all elements of I i+1 (∄x ∈ I i , ∀y ∈ I i+1 , x > y).
In such a sequence, the intervals may overlap (I i ∩ I j = ∅, i = j), may be included in each other (I i ⊆ I j , i = j), or may even be equal (I i = I j , i = j).
Given a sliding sequence We say that S is a sliding set.If the intervals are disjoint then In the case of Figure 1, for instance, [23,24], ]25, 28], [27,29], {30} is a sliding sequence and a, [23,24], b, ]25, 28], c, [27,29], d, {30}, e is a sliding set.The elements of this set are the the paths a, t More generally, all paths in any link stream are elements of sliding sets.In the case of Figure 1, for instance, all shortest paths from (0, a) to (14, e) go from a to b between times 1 and 2, from b to c between times 3 and 5, from c to d between 6 and 7, and finally from d to e between 9 and 11.Therefore, they are elements of a, [1,2], b, [3,5], c, [6,7], d, [9,11], e.
In addition, if we consider any two elements (i, u) and (j, v) of T × V , then we have the following result.Proposition 1.The set SP((i, u), (j, v)) of all shortest paths from (i, u) and (j, v) is the disjoint union of a finite number of sliding sets.
Proof.Let us consider all sliding sequences I 1 , I 2 , • • • , I k with k = d((i, u), (j, v)) and I i is either an open interval ]t, t ′ [ such that t and t ′ are two consecutive event times, or I i is a singleton {t} such that t is an event time.There is a finite number of such sequences, and they induce a finite number of sliding sets which are all disjoint.
Any path in SP((i, u), (j, v)) is in one of these sliding sets, and then all the elements of this sliding set are shortest paths from (i, u) to (j, v).Therefore SP((i, u), (j, v)) is the union of such sliding sets.

Definition 1 (volumes). The volume of a sliding set
denoted by |S|, is defined by its size and dimension as follows: • If I i is a singleton for all i, then S contains only one sequence.It has size 1 and dimension 0.
• Otherwise, let In both cases, the volume of S, |S|, is defined as the pair (size(S), dim(S)) giving its size and dimension.
More generally, we have the following definitions for volume operations.
Definition 2 (addition, ⊞).Given two disjoint sliding sets S and S ′ of volume |S| = (s, d) and |S ′ | = (s ′ , d ′ ), the volume of their union S⊔S ′ is the sum of their two volumes, which we denote by |S|⊞|S ′ |.In such a sum, volumes in lower dimensions are negligible, and the sizes of volumes with maximal dimension just add up, so we obtain Definition 3 (product, ).Consider three nodes u, v and w in V , and two sets S and S ′ such that all elements of S are of the form u, t w such that the sequence from u to v is in S and the one from v to w is in S ′ .If S and S ′ are disjoint unions of a finite number of sliding sets with |S| = (s, d) and |S ′ | = (s ′ , d ′ ), then S • S ′ also is the disjoint union of a finite number of sliding sets, and its volume is These notations and operations make it easy to describe the set SP((i, u), (j, v)) and compute its volume, which is the goal of this section.
In the non-trivial cases above, for instance, |SP((0, a), We will now prove two lemmas needed to compute the volume of shortest paths from a given temporal node (i, u) in T × V to another one (j, v) in T × V .Lemma 1 shows how to compute the volume of shortest paths between two consecutive event times.Lemma 2 shows how to decompose the set of shortest paths from a temporal node to another one into a disjoint union of smaller sets of shortest paths.This will lead to Algorithm 1, that starts by computing the volume of shortest paths from (i, u) to (i, w) for any w.Then, in a temporal BFS-like manner, it uses volumes from (i, u) to (t, x) to compute volumes from (i, u) to (t ′ , w), for increasing pairs of consecutive event times t and t ′ .Indeed, as illustrated in Figure 2, the volumes at t ′ can be derived from the ones at t.The temporal BFS also uses two queues, named Q and X, to compute the distance that are also needed to compute volumes of shortest paths.It stops when it reaches time j.
In all the following, we consider two consecutive event times t and t ′ .For all x and y in ]t, t ′ [, the graphs G x and G y are identical.We denote by G + t (or G − t ′ ) this graph, and by ) the (finite) number of shortest paths and the distance from u to v in this graph.
Lemma 1.Given two nodes x and w, the volume of the set of shortest paths from x to w that start and arrive during ]t, t ′ [ is equal to Therefore, the set of shortest paths from x to w that start and arrive during ]t, It is easy to show by induction that the size of each such sliding set is k! , and its dimension is k.The volume of SP((t, x), (t ′ , w)) is the sum of the volumes of all these sliding sets, and there are σ + t (x, w) such sliding sets, which completes the proof.
Then, the volume of SP((i, u), (t ′ , w)) is the sum of the two following volumes: and    (i, u) to (t ′ , w) is the concatenation of either (left) a blue path from (i, u) to a given (t, x) and a green path from this (t, x) to (t ′ , w) in G + t ; or (right) a blue path from (i, u) to a specific (t ′ , y) and then a jump from y to w at time t ′ using (t ′ , yw) ∈ E t ′ .Theorem 3. Given two temporal nodes (i, u) and (j, v) in T × V , Algorithm 1 computes the volume of shortest paths from (i, u) to (j, v).

Proof. Let us denote by
Algorithm 1: Volume of shortest paths between two temporal nodes.
1 Function VSP: Input: a link stream L = (T, V, E), (i, u) ∈ T × V , and (j, v) ∈ T × V Output: volume of shortest paths from (i, u) to (j, v)  for all w when one enters the main loop at line 6, then at the end of the loop we have This is sufficient to prove that the algorithm returns |SP((i, u), (j, v))|, since the loop at line 4 initializes Dist and vol correctly.Lines 7 to 13 deal with the computation of d((i, u), (t ′ , w)) from the distances at time t, for all w.It is similar to a BFS on the graph G t ′ , except that distances at t ′ are bounded by the ones at t: d((i, u), (t ′ , w)) ≤ d((i, u), (t, w)).The loop therefore uses two queues: a list X of nodes in increasing distance at time t, and a queue Q for the exploration of G t ′ .At each round, we consider a node w with minimal distance in these queues: Line 11 takes the first element of X or Q, depending on which has the minimal second field d.This is its actual distance d((i, u), (t ′ , w)) (line 12).Then we add its neighbors to Q, together with the information that their distance from (i, u) cannot be larger than d((i, u), (t ′ , w)) + 1 (line 13).The loop ends when both X and Q are empty, i.e. the distances to all reachable nodes are found.
Then, Lines 14 to 18 deal with the computation of |SP((i, u), (t ′ , w))| from the volumes at time t, for all reachable w.They are a straightforward application of Lemma 2.

Latency pairs
Let us consider a link stream L = (T, V, E), and two nodes u and w in V .The previous section shows how to compute the volume of shortest paths from u to w between two given time instants i and j.However, betweenness computations rely on volumes of shortest fastest paths from u to w.These paths are the shortest paths from (s, u) to (a, w) if the latency from (s, u) to (a, w) is equal to a − s.We then say that (s, a) in T × T is a latency pair from u to w (in L).This section is devoted to the computation of such latency pairs.
In the case of Figure 1, for instance, (2, 9) is a latency pair from a to e, because the fastest paths from (2, a) to (9, e) start at 2 and end at 9. Similarly, (9, 16), (16,23) and (24,30) are the other latency pairs from a to e. Instead, (3,8) is not a latency pair from a to e since there is no path from (3, a) to (8, e), and (1, 9) is not a latency pair from a to e either because the fastest paths from (1, a) to (9, e) start at time 2.
For any t in T , the pair (t, t) is a latency pair from u to w exactly if there is an instantaneous path between (t, u) and (t, w), i.e. there is a path between u and w in G t .The latency between (t, u) and (t, w) is then equal to 0, and we call (t, t) an instantaneous latency pair.In the case of Figure 1, such latency pairs occur from b to d at all times from 12 to 14, at time 19, and at all times from 27 to 28.
Notice that there may exist an infinite amount of instantaneous latency pairs from a node to another one, like in this last example, but there is only a finite number of noninstantaneous latency pairs.Indeed, if (s, a) is a latency pair with a − s = 0, then s and a necessarily are event times, and as said in Section 2 all link streams considered here have a finite number of event times.
Notice also that if (s, a) is a latency pair from u to w, then there cannot be any latency pair (s ′ , a ′ ) from u to w with [s ′ , a ′ ] [s, a].Indeed, this would imply that the latency from (s, u) to (a, w) is equal to s ′ − a ′ < s − a, which contradicts the fact that (s, a) is a latency pair.This also implies that, if (s, a) is a latency pair with s = a, then necessarily s and a are event times: otherwise, there is a pair which would imply that (s ′ , a ′ ) also is a latency pair, which contradicts our previous remark.
As a consequence, latency pairs are componentwise ordered: if (s, a) and (s ′ , a ′ ) are two distinct latency pairs, then [s ′ , a ′ ] ⊆ [s, a] and [s, a] ⊆ [s ′ , a ′ ].Therefore, either s < s ′ and a < a ′ , or s ′ < s and a ′ < a.
Our algorithm considers all event times in increasing order.It maintains the latency lists from a given node to all others before the current event time.It then updates these latency lists for the current time by computing the connected components of the graph at this time.For each of these components, it considers the latest starting time from which a node in this component can be reached, which is given by the previously computed latency lists.This time is the beginning of latency pairs for its nodes, that ends at current time, and so the algorithms updates the lists accordingly.
Algorithm 2: Computation of all latency lists from a given node.return LL Theorem 4. Given a link stream L = (T, V, E) and a node u ∈ V , Algorithm 2 computes the ordered latency lists from u to any node w ∈ V .
Proof.We claim that, at the end of each iteration of the main loop, for all w in V , LL [w] is the list of all latency pairs (s, a) from u to w such that s and a are event times with a ≤ t.Assume this is true for all iterations before a given event time t.When it reaches this event time, the loop starts by adding (t, t) to LL [u], which makes the claim true for w = u.Consider any connected component C of G t ; the nodes w ∈ C, with non-empty LL[w] are the nodes reachable from u with an arrival time before t or at t.Then, the value of s ′ computed by the loop at Line 7 is the latest starting time such that one of these nodes is reachable from (s ′ , u) before t or at t, and X is the set of these reachable nodes.
Therefore, if X is non-empty, there exists a path from (s ′ , u) to (t, w) for any w ∈ C\X: for any x ∈ X, the path from (s ′ , u) to (t, x) and then from (t, x) to (t, w) (which exists since x and w are in the same connected component C of G t ) is such a path.As a consequence, (s ′ , t) is a latency pair for any w ∈ C\X.Notice that (s ′ , t) is not a latency pair for any node x ∈ X, x = u, since they all have a latency pair (s ′ , t x ) with t x < t.
Finally, if the claim is true for all event times lower than t, it is true for t too.It is true for the first iteration, i.e. when t is the first event time: it sets LL[w] to {(t, t)} for all node w in the same connected component of G t as u, which is the correct value.Therefore, for all w in V , the returned value of LL [w] is the list of latency pairs (s, a) from u to w such that s and a are event times, and it is ordered by construction.

Contribution of a node pair
In all this section, we consider a link stream L = (T, V, E) and two nodes u and w in V .In addition, we consider a temporal node (t, v) in T × V .
For any i and j in T , we denote by C ij tv (u, w) the fraction σ((i,u),(j,w),(t,v)) σ((i,u),(j,w)) of shortest fastest paths from (i, u) to (j, w) that involve (t, v), and we call it the contribution of (i, j).If there is no path from (i, u) to (j, w), we consider that C ij tv (u, w) = 0.By extension, we call i,j∈T C ij tv (u, w) di dj the contribution of (u, w) to the betweenness of (t, v), and we denote it by C tv (u, w).The goal of this section is to compute C tv (u, w).
First notice that the contribution of (i, j) is derived from volumes of paths as follows.Given x, y and z in T × V , we denote by SFP(x, y) the set of all shortest fastest paths from x to y, and by SFP(x, y, z) the set of these paths that involve z.Then, we define σ(x, y) and σ(x, y, z) as the volumes of SFP(x, y) and SFP(x, y, z), respectively.It follows that C ij tv (u, w) is equal to σ((i, u), (j, w), (t, v)) σ((i, u), (j, w)) if there is a path from (i, u) to (j, w).Otherwise, C ij tv (u, w) is 0. This gives a rigorous ground to the definition of C ij tv (u, w), which, as discussed at the end of Section 2, was loosely defined as the fraction σ((i,u),(j,w),(t,v)) σ((i,u),(j,w)) of shortest fastest paths from (i, u) to (j, w) that involve (t, v); it is indeed equal to the ratio between the two volumes σ((i, u), (j, w), (t, v)) and σ((i, u), (j, w)) now defined, with volume ratio operation from Definition 4: C ij tv (u, w) = σ((i,u),(j,w),(t,v)) σ((i,u),(j,w)) .Consider for instance the case of Figure 1 with u = a and w = e, and let us consider i = 0 and j = 18.Then, the shortest fastest paths from (i, u) = (0, a) to (j, w) = (18, e) are the elements of the set SFP((0, a), (18, e)) = X ⊔ Y where X and Y are the sliding sets a, {2}, b, [3,5], c, [6,7], d, {9}, e and a, {9}, c, {11}, b, [12,14], d, {16}, e, respectively.If (t, v) = (7.5, c) or (t, v) = (10, b), for instance, then none of these paths involve (t, v) and so we obtain a 0 contribution.If (t, v) = (4.5, c) or (t, v) = (8, d), for instance, then all paths in X involve (t, v) and no path in Y does, leading to C ij tv (u, w) = σ((0, a), (18, e), (t, v)) σ((0, a), (18 Before presenting the algorithm computing these path volumes and associated contributions, we characterize more precisely which pairs (i, j) have non-zero contribution.Lemma 5.There is at most one latency pair from u to w with non-zero contribution.
Proof.Consider two distinct latency pairs (s, a) and (s ′ , a ′ ); we can assume s < s ′ and a < a ′ , since, as explained in previous section, [s ′ , a ′ ] ⊆ [s, a] is impossible.Suppose both latency pairs have non-zero contribution: there are shortest fastest paths from (s, u) to (a, w) that involve (t, v) and from (s ′ , u) to (a ′ , w) that also involve (t, v).Therefore, there is a path from (s ′ , u) to (t, v) and a path from (t, v) to (a, w), and so a path from (s ′ , u) to (a, w).It has duration a − s ′ which is strictly lower than both a − s and a ′ − s ′ , thus contradicting both that (s, a) and (s ′ , a ′ ) are latency pairs.If all latency pairs from u to w have contribution 0, then the contribution of (u, w) itself is 0. Otherwise, let us denote by (s, a) the unique latency pair with non-zero contribution.
We now introduce two specific times, S and A, that we will use to find all time instants with non-zero contribution.We define ]S, A[ as the largest interval containing ]s, a[ such that: for all other latency pair (s ′ , a ′ ) in this interval, either a ′ − s ′ > a − s, or a ′ − s ′ = a − s and d((s ′ , u), (a ′ , w)) ≥ d((s, u), (a, w)); and the number of instantaneous paths from (S, u) to (A, w) of length d((s, u), (a, w)) is finite.We illustrate this definition in Figure 3.  E) in which we consider a specific (t, v) in T × V (in red), two nodes u and w in V (in black, horizontal lines), as well as the latency pair (s, a) containing t such that shortest (necessarily fastest) paths from (s, u) to (a, w) have length d and some of them involve (t, v).We display all latency pairs from u to w with two green vertical lines topped by a dotted horizontal line indicating the corresponding latencies (= a − s, < a − s or > a − s).In addition, we also indicate the length (= d, < d or > d) of corresponding shortest paths within each latency pair, when this is useful (in grey).We indicate in blue the two specific times S and A defined above, as well as the time periods for i and j such that the contribution of (i, j) may be non-zero (Lemma 6).
We then have the following result.Proof.If a given pair (i, j) has non-zero contribution, then there is a latency pair (s ′ , a ′ ) with s ′ ≥ i and a ′ ≤ j that has non-zero contribution.Remind that (s, a) is itself such a latency pair.From Lemma 5, we then have (s ′ , a ′ ) = (s, a), and so i ≤ s and j ≥ a.
If i < S and j ≥ a, or if i ≤ s and j > A, then by definition of S and A we are in one of the following situations.
There exists a latency pair (s ′ , a ′ ) in [i, j] such that: either a ′ −s ′ < a−s, or a ′ −s ′ = a−s and d((s ′ , u), (a ′ , w)) < d((s, u), (a, w)).Then, shortest fastest paths from (s, u) to (a, w) are not shortest fastest paths from (i, u) to (j, w).All shortest fastest paths from (i, u) to (j, w) are from (s ′ , u) to (a ′ , w) where (s ′ , a ′ ) is a latency pair as described above.Suppose such a shortest fastest path involves (t, v).Then there are paths from (s ′ , u) to (t, v) and from (t, v) to (a ′ , w).As a consequence, s ′ ∈ [s, t], otherwise (s, a) would not be a latency pair.Likewise, a ′ ∈ [t, a].Therefore, s ′ < s and a ′ > a, but this contradicts the fact that a ′ − s ′ ≤ s − a.This means that shortest fastest paths from (s ′ , u) to (a ′ , w) cannot involve (t, v), and so the contribution of (i, j) is 0.
Or there is an infinite number of instantaneous paths from (i, u) to (j, w) with length d((s, u), (a, w)).Only the σ t (u, w) ones starting and arriving at time t involve (t, v).There is a finite number of such paths, as they are paths in the graph G t .Therefore, the contribution of (i, j) is zero.
In conclusion, i ≤ s, j ≥ a, i cannot be smaller than S, and j cannot be larger than A, which proves the claim.This lemma says that all pairs (i, j) with non-zero contribution are in [S, s] × [a, A].Notice however that some pairs (i, j) in [S, s] × [a, A] may have a contribution equal to 0. This happens whenever the volume of shortest fastest paths from (s, u) to (a, w) has a lower dimension than the one from (i, u) to (j, w).
We now define specific latency pairs that play a special role, as any shortest fastest path from (i, u) to (j, w) must start and arrive within one of these pairs.To do this, we introduce an ordered list LP of latency pairs centered on (s, a), which means that latency pairs preceding (s, a) have negative indexes in the list and the others have positive indexes.It is the list LP = (s −l , a −l ), (s −l+1 , a −l+1 )), . . ., (s 0 = s, a 0 = a), . . ., (s r , a r ) such that, for all k, [s k , a k ] ⊆ [S, A], a k − s k = a − s, and d((s k , u), (a k , w)) = d((s, u), (a, w)).We also define s −l−1 = S and a r+1 = A. Notice that s −l−1 = s l or a r = a r+1 are not forbidden; this happens for instance when s −l = α or a r = ω.We show now that the latency pairs in LP give precisely the shortest fastest paths from u to w.

Lemma 7. For any pair
Proof.We first show that for any Moreover, since i > S and j < A, there exists no latency pair (s ′ , a ′ ) such that [s ′ , a ′ ] ⊆ [i, j] and a ′ − s ′ < a − s.Therefore, it is a fastest path from (i, u) to (j, w).Similarly, because (s k , a k ) is in LP, this path has length d((s, u), (a, w)) and therefore it is a shortest fastest path from (i, u) to (j, w).Now consider any shortest fastest path from (i, u) to (j, w), and let us denote by s ′ and a ′ its starting and arrival times.Since it is a fastest path, (s ′ , a ′ ) is a latency pair, and obviously [s ′ , a ′ ] ⊆ [i, j].In addition, s ′ − a ′ = s − a: if it was larger then the paths from (s ′ , u) to (a ′ , w) would not be fastest paths from (i, u) to (j, w); and if it was smaller, then the paths from (s, u) to (a, w) would not be fastest paths from (i, u) to (j, w).Similarly, d((s ′ , u), (a ′ , w)) = d((s, u), (a, w)): if it was larger then the paths from (s ′ , u) to (a ′ , w) would not be shortest paths from (i, u) to (j, w); and if it was smaller, then the paths from (s, u) to (a, w) would not be shortest paths from (i, u) to (j, w).Therefore, (s ′ , a ′ ) is in LP, leading to the fact that SFP((i, u), (j, w)) is included in the union of all sets Finally, notice that the sets SFP((s k , u), (a k , w)) are disjoint for different values of k, since all the paths they contain start at s k and arrive at a k .We therefore obtain the claim.
Algorithm 3: Compute the list of starting times for latency pairs in LP, as well as associated volumes of sets of shortest fastest paths.
1 Function PrevList: Thanks to these results, we obtain an expression giving the contribution of a node pair as a discrete sum.Lemma 8.The contribution of (u, w) to the betweenness of (t, v), i.e. the fraction of shortest fastest paths from u to w that involve (t, v), namely C tv (u, w) = i,j∈T σ((i,u),(j,w),(t,v)) σ((i,u),(j,w)) di dj, can be written as a discrete sum: Proof.According to Lemma 6, the contribution of time instants (i, j) is equal to zero whenever all shortest fastest paths from (i, u) to (j, w) involving (t, v) start at time s and arrive at time a and the contribution of (i, j) is therefore equal to σ((s, u), (a, w), (t, v)) σ((i, u), (j, w)).Therefore, di dj.According to Lemma 7, for any k < 0, any k ′ ≥ 0, any i ∈ ]s k , s k+1 ], and any j ∈ [a k ′ , a k ′ +1 [, the value of σ((i, u), (j, w)) is constant and it is equal to ⊞ k ′ h=k+1 σ((s h , u), (a h , w)).Therefore, and we obtain the claim.
In order to compute the sum of Lemma 8, we need to iterate over all s k , −l−1 ≤ k ≤ −1 and all a k ′ , 0 ≤ k ′ ≤ r.For this purpose, we first give an algorithm computing the values of s k .The algorithm also associates to each s k a volume of shortest fastest paths that will be useful for computing the denominator of the fraction in the sum.
Proof.The algorithm builds and returns the (initially empty) list Result.The algorithm terminates when S is found.Indeed, a return is triggered in three different cases.If the empty list is returned at Line 4, this means that there exists an ǫ > 0 such that for all t ∈ [s − ǫ, s], there is an instantaneous path of length d((s, u), (a, w)) from (t, u) to (t, w), which implies that S = s.If the return happens after the last value is added to Result during the for loop, then either s ′ < s, and s ′ is the largest value such that: a ′ − s ′ < a − s (Line 7); or a ′ − s ′ = a − s and d((s ′ , u), (a ′ , w)) < d((s, u), (a, w)) (Line 10); or there exists ǫ > 0 such that for any t ∈ [s ′ − ǫ, s ′ ] there is an instantaneous path of length d((s, u), (a, w)) from (t, u) to (t, w) (Line 14).This corresponds exactly to the definition of S. Finally if the function returns at Line 16, then this means that S = α because none of the above conditions is true for any s ′ > α.
Let us now show that vol contains the desired value when (s ′ , vol) is added to Result.
If the empty list is returned at Line 4, then this is true.Otherwise, vol is initialized to (0, 0) and this value is not changed before the first time the pair (s ′ , vol) is appended to Result.Therefore the first pair appended to Result is (s −1 , (0, 0)), which is correct since f −1 = σ((s 0 , u), (a −1 , w)) = (0, 0) (since there are no paths from (s 0 , u) to (a −1 , w)).
Assume now that the correct value (s i , f i ) has been added to Result at one loop iteration, and that s i > S (otherwise, as shown above, a return is triggered just after the append and the function returns).We then have s ′ = s i and the value σ((s ′ , u), (a ′ , w)) is then added to vol.vol is therefore now equal to σ((s ′ , u), (a ′ , w)) ⊞ f i = σ((s i , u), (a i , w)) ⊞ σ((s i+1 , u), (a −1 , w)) = f i−1 .Moreover, the loop will skip latency pairs not in LP and the next value of s ′ that will be considered is s i−1 .Therefore the next value that is added to Result is (s i−1 , f i−1 ) and finally all the correct values are added to Result, which completes the proof.The obtained function computes the list (a 1 , g 1 ), (a 2 , g 2 ), . . ., (a r+1 = A, g r+1 ) with a k defined by LP and with g k = σ((s 1 , u), (a k−1 , w)).
We finally reach the objective of this section.
Theorem 10.Given a link stream L = (T, V, E), a temporal node (t, v) in T × V , and two nodes u and w in V , Algorithm 4 computes the contribution of u and w to the betweenness of (t, v), i.e.C tv (u, w) = i∈T,j∈T σ((i, u), (j, w), (t, v)) σ((i, u), (j, w)) di dj.
Since there is at most one such pair satisfying the first three conditions, the algorithm breaks out of the for loop if one is found.If no such latency pair is found, vol_tv is equal to (0, 0) at the end of the loop and the Algorithm returns 0. Notice that, in the special case where t is not an event time and (t, u) −→ (t, w), then the arguments above do not Algorithm 4: Contribution of two given nodes to the betweenness of a given temporal node 1 Function Contribution: Input: a link stream L = (T, V, E), u ∈ V , w ∈ V , (t, v) ∈ T × V , and the latency list LL from u to w Output: the contribution T ×T σ((i, u), (j, w), (t, v)) σ((i, u), (j, w)) di dj  apply, but the algorithm still returns the correct value: (t, t) is a latency pair that does not belong to the latency list, and the contribution of (u, w) is 0, which is the returned value.
Lines 15 to 20 of Algorithm 4 compute this sum.First notice that s ′ is initialized to s = s 0 and s_lef t loops over values in Prev, starting with s −1 .At the end of each iteration s ′ is set to s_lef t and therefore s ′ and s_lef t loop over all consecutive values s k+1 , s k for s k in Prev.The value of lef t is f k = σ((s k+1 , u), (a −1 , w)) = ⊞ −1 h=k+1 σ((s h , u), (a h , w)), as explained in the characterization of Prev above.

Betweenness of a temporal node
We now have all needed building blocks for computing the betweenness of any given temporal node: we just have to sum the contribution of each node pair, see Algorithm 5.
Algorithm 5: Betweenness of a temporal node 1 Function Betweenness: The key variables describing the size of our algorithm inputs, for a given link stream L = (T, V, E), are the number of nodes n = |V | and the number of link segments m, i.e. the number of maximal intervals in E. Notice that the number of event times |T | is at most 2 • m, and so it is in O(m).Likewise, the number of links m t = |{uv, (t, uv) ∈ E}| at time t, for any t, as well as the number of links m = |{uv, ∃t, (t, uv) ∈ E}| in the induced graph, also are in O(m).Then, the complexity of all algorithms presented in this paper is clearly polynomial in n and m, which makes Algorithm 5 polynomial itself.
We display in Figure 4 the results obtained in the case of Figure 1, where we computed the betweenness of more than 5000 temporal nodes in a few seconds.We provide the implementation at [16]. Figure 4: Results of betweenness computations on the example of Figure 1.We computed the betweenness of (t, v) for all v and t equal to α + i • ω−α 1000 , for i = 0..1000.The obtained value is displayed at (t, v) as a black rectangle of width ω−α 1000 and height propotional to the betweenness of (t, v).Dotted lines represent betweenness values equal to 0.

Related work
Betweenness computations are first related to path computations.Temporal paths already received much attention, in particular optimal path computations according to several criteria (like length, duration, and/or arrival time), see for instance [5,33,34].However, most of these works are limited to discrete time and instantaneous links; only few consider continuous time and links with duration [31,35,24].Then, the focus is on finding optimal paths or computing distances and latencies [24], not counting them as we do here.The authors of [1] notice that the number of foremost paths (temporal paths with minimal arrival time) may be exponential.The problem that we consider here is quite different because we handle continuous time and links with duration.This leads to the concept of finite volumes of uncountable path sets, that never appeared in previous literature, up to our knowledge.
The graph betweenness itself also has been studied in temporal settings.A first line of study focuses on updating betweenness values upon link arrival or departure, see for instance [3,11].This is quite different from our work: the considered paths are classical (static) graph paths, and the considered betweenness is the classical one, at each time instant.
Several works consider temporal betweenness extensions that rely on various kinds of optimal (fastest, shortests, foremost, etc) paths.Most have a node-centric view: they define a value for each node, not for each temporal node, see for instance [13,27,21,32,15].Others define a value for each temporal node, like in our case.For instance, [26] proposes coverage centrality of (t, v), defined as the fraction of pairs of (non-temporal) nodes for which there exists a fastest path involving (t, v).Buß et al. [6] consider instantaneous links and define betweenness centralities for various types of optimal temporal paths.The authors of [27,12] define a betweenness value for each temporal node, based on foremost paths or other optimal paths.The algorithm in [12] starts by identifying time instants for which foremost path trees are stable, which is related to our latency pairs.In [28], the authors combine the length and duration of paths using a tunable parameter, and focus on instantaneous links.
All these works assume discrete time, which implies finite sets of shortest paths.Instead, we consider continuous time, leading to uncountable sets of paths, with finite volume.In addition, these works keep a partly node-centric point of view by considering paths between nodes; we push the integration of temporal aspects further by considering paths between temporal nodes.This makes an important difference, since the node-centric view misses locally-optimal paths: they only count paths with a given duration or length between pairs of nodes (for any starting and arrival times), whereas our approach combines a variety of locally shortest fastest paths, with different durations and lengths.This raises different algorithmic challenges, like the computation of latency lists and the selection of appropriate contributing latency pairs.Closer to our work, [1] and [22] consider optimal paths within time slices, thus obtaining a betweenness value for each node for each time slice.Again, they only consider discrete time, and only a limited number of source and target temporal nodes.
Finally, the generalized betweenness that we consider in this paper, by dealing with continuous time, links with or without duration, as well as paths between all pairs of temporal nodes, raises original algorithmic questions that are not present in previous literature.

Conclusion
We presented the first algorithms to compute betweenness centrality of temporal nodes in link streams.To obtain these algorithms, we identified and addressed several original challenges, like the definition and computation of volumes of infinite sets of paths, the computation of all latency pairs from any node to all others, or the transformation of continuous-time integrals into discrete sums over finite numbers of time intervals.Each of these building blocks has its own interest, in particular the computation of shortest path volumes from a given temporal node.The complexity of obtained algorithms is polynomial in time and space, and we provide an implementation in python [16].
Our algorithm leaves room for complexity improvement.In particular, it seems promising to explore extensions to link streams of approaches like Brandes' for betweenness on graphs [4].Another important direction is to design algorithms to compute the betweeness of all temporal nodes rather than just one: iterating our algorithm over many temporal nodes leads to much redundancy.However, keep in mind that there is an infinite number of temporal nodes; one may then try to infer the betweenness of any of them from the betweenness of a finite number of them, for instance each node at each event time.This seems non-trivial, though, and an open question.
Going further, one may try to design approximate algorithms.Indeed, the best known time complexity of betweenness computations in graphs is O(nm) [4] and it cannot be lower in link streams, since graphs are special cases [18].This is prohibitive in many practical cases, leading to much work on approximate computations, that typically compute shortest paths from some nodes only [23,29].Such approaches are very relevant in link streams too, where the contribution of only a few node pairs may give reasonably accurate approximates, at a much lower cost than exact computations.This remains to explore, though.
An even more challenging direction is to embrace the streaming nature of link streams, and design on-line and/or streaming algorithms for betweenness.Such algorithms do not store the data in memory; they compute results on-the-fly and output them as soon as they are available.They would be of high theoretical and practical interest, but they raise many challenges.
Another interesting family of perspectives consists in extending or restricting the considered input.In particular, one may consider stream graphs instead of link streams: in stream graphs, nodes are not always present, leading to more subtle path, distance, and latency concepts [18].We considered here streams with link (and node) presence times equal to unions of disjoint closed intervals (including singletons); another extension would be to consider more general cases, like for instance unions of disjoint closed or open intervals.Also, weighted and/or directed stream graphs and link streams [17] lead to more complex concepts of shortest fastest paths, and our definitions of volumes may be extended to these cases.Conversely, one may consider more specific situations, like discrete time streams, or link stream with instantaneous links only.Such cases often appear in practice, and it may be possible to design more efficient algorithms for them.
Extending our algorithms to variants of the betweenness concept itself also is an interesting perspective.One may for instance consider betweenness of links rather nodes, or consider paths of other kinds than shortest fastest ones, e.g.foremost ones [18] Finally, this paper opens the perspective of practical uses of betweenness in link streams, since until now only the definition was available.It is now possible to explore how betweenness is distributed in (small scale) real-world cases, and gain insight from this.It may also be used to extend important graph algorithms to link streams, like the computation of communities by iteratively removing temporal nodes of highest betweenness, in a way similar to [10] that iteratively removes links of highest betweenness.

Figure 2 :
Figure2: If t and t ′ are two consecutive event times, then a shortest path from (i, u) to (t ′ , w) is the concatenation of either (left) a blue path from (i, u) to a given (t, x) and a green path from this (t, x) to (t ′ , w) in G + t ; or (right) a blue path from (i, u) to a specific (t ′ , y) and then a jump from y to w at time t ′ using (t ′ , yw) ∈ E t ′ .

Figure 3 :
Figure3: An abstract example of link stream L = (T, V, E) in which we consider a specific (t, v) in T × V (in red), two nodes u and w in V (in black, horizontal lines), as well as the latency pair (s, a) containing t such that shortest (necessarily fastest) paths from (s, u) to (a, w) have length d and some of them involve (t, v).We display all latency pairs from u to w with two green vertical lines topped by a dotted horizontal line indicating the corresponding latencies (= a − s, < a − s or > a − s).In addition, we also indicate the length (= d, < d or > d) of corresponding shortest paths within each latency pair, when this is useful (in grey).We indicate in blue the two specific times S and A defined above, as well as the time periods for i and j such that the contribution of (i, j) may be non-zero (Lemma 6).

Lemma 6 .
All pairs (i, j) in T × T that have non-zero contribution are in [S, s] × [a, A].
Definition 4 (quotient and difference, , and ⊟).Consider S and S ′ two disjoint unions of sliding sets with |S| = (s, d) and |S ′ | = (s ′ , d ′ ), and such that S ′ ⊆ S. Then necessarily d ′ ≤ d and the fraction of elements of S that are also in S ′ , which we denote by |S ′ | |S| or |S ′ | |S| , is equal to 0 if d > d ′ , and to s ′ /s if d = d ′ .In addition, the set S \ S ′ is a disjoint union of sliding sets, and its volume is