1 Introduction

Batch-dynamic algorithms on graphs have attracted considerable interest in recent years, for both fundamental research and practical applications [2, 3, 14, 15, 18, 29, 41]. Different from traditional dynamic algorithms which handle single changes sequentially—one at a time, batch-dynamic algorithms focus on dynamically maintaining graph properties, albeit graphs may undergo rapid changes through large batches of updates. Nowadays, many real-world applications operate on rapidly changing graphs, such as communication networks [34], context-aware search in web graphs [37], social network analysis [7, 38], route-planning in road networks [1, 13], and management of resources in computer networks [8].

Fig. 1
figure 1

A high-level overview of our batch-dynamic method BatchHL\(^+\): (left) A graph G with a landmark 5 on which a batch update (a set of edge insertions and deletions) is performed; (middle) A comparison of affected vertices being identified by the state-of-the-art method BatchHL and our proposed method BatchHL\(^+\); (right) Three different interaction patterns in which v can be pruned by BatchHL\(^+\) but cannot be pruned by BatchHL

Given a graph, a distance query computes the shortest path distance between two vertices in the graph, which is a fundamental problem both theoretically and in practice. A classical approach to answer distance queries is to run the Dijkstra’s algorithm for non-negative weighted graphs or breadth-first search algorithm for unweighted graphs [36]. However, these algorithms are inefficient for large graphs. A well-established technique for speeding up query response time is to pre-compute and store distance labelling such as 2-hop labelling [11] and in particular pruned landmark labelling [4]. Although these labelling techniques have been shown to be effective in improving query response time on static graphs, the use of distance labelling poses an additional challenge for dynamic graphs—“How to efficiently update distance labelling in order to reflect changes on graphs, particularly when changes occur rapidly?”

Several studies have attempted to address this challenge by developing dynamic 2-hop labellings (in particular pruned landmark labellings) in the sequential setting which process one single update (edge insertion or edge deletion) at a time [5, 12, 23, 35]. However, the size of such distance labellings can grow quadratically with the size of a graph. This not only limits the applicability of these distance labellings to graphs with up to millions of nodes and edges, but also increases the difficulty and complexity of updating labellings. That is to say, even if such distance labellings can be constructed on graphs, the computational cost of updating them to reflect rapid changes is still unbearably high.

Recently, a batch-dynamic method to answer distance queries, namely BatchHL, has been proposed [18]. The key idea of BatchHL is to combine pre-computed labelling with online searches so as to exploit the merits of both sides—accelerating query response time through a partial distance labelling that is of limited size but can bound online search space. Further, BatchHL can efficiently dynamize a partial distance labelling to reflect large batches of updates on a graph. It has been shown [18] that BatchHL offers significant performance gains in comparison with other state-of-the-art methods for answering distance queries on dynamic graphs and can scale to billion-scale graphs without compromising query and update performance.

In this work, we further explore possible ways to advance the design of batch-dynamic algorithms for distance queries. We first analyse how different types of updates (edge insertion and edge deletion) interact with each other. Then, based on that, we unearth new patterns of update interactions that can be leveraged to design pruning rules for reducing update search space. These interesting findings lead us to developing a new batch-dynamic algorithm, called BatchHL\(^+\), which can dynamize labelling for distance queries much more efficiently than BatchHL on graphs that undergo rapid changes. Figure 1 presents a high-level overview of comparing BatchHL and BatchHL\(^+\) in terms of several typical update interaction patterns. We can see that the search space of BatchHL, the state-of-the-art method, is more exhaustive than BatchHL\(^+\) when updating labelling to reflect batch updates on a graph. This is because BatchHL\(^+\) further prunes search space by exploiting interactions between different types of updates. Concretely, in the lower right-hand corner of Fig. 1, three update interaction patterns are presented, and vertex v cannot be pruned by BatchHL but can be pruned by BatchHL\(^+\) successfully. This enables BatchHL\(^+\) to perform much more efficiently than BatchHL.

Novelty. This article is an extended version of the previous work [18]. The following contributions are novel.

  • We explore update interaction patterns and characterize the minimal set of vertices affected by batch updates, for which a “perfect” batch-dynamic algorithm may wish to identify. We show that algorithms for only edge insertions or only edge deletions can precisely identify such vertices in each separate case. However, combining these two algorithms fails to compute the exact set of vertices affected by batch updates in the general case (Sect. 5).

  • We propose an improved batch-dynamic method, namely BatchHL\(^+\), which is equipped with optimized batch search and batch repair algorithms. In the original work of BatchHL, a vertex is flagged as affected if any shortest path is eliminated. In contrast, BatchHL\(^+\) proposed in this work only flags a vertex as affected if every shortest path is eliminated. As a result, edge deletions can be processed almost as quickly as edge insertions (Sect. 6).

  • We present a comprehensive discussion on distinctive characteristics of different batch-dynamic algorithms and their relationships in terms of their capacity of identifying vertices that are “affected” by batch updates on dynamic graphs. This leads to several different notions of “affected” vertices and the connections between these notions manifest the source of efficiency differences among different batch-dynamic algorithms. For example, BatchHL\(^+\) handles a much smaller subset of “affected” vertices than BatchHL (Sect. 7).

  • We provide detailed proofs for the correctness and minimality of BatchHL\(^+\). We note that the proof of correctness for the batch search algorithm (i.e. Algorithm 6) of BatchHL\(^+\) is non-trivial and requires a delicate analysis of interaction patterns of batch updates. We also conduct the time and space complexity analysis of BatchHL\(^+\) (Sect. 8).

Empirically, we have evaluated our method on 15 real-world networks to verify efficiency and scalability. The results show that our method significantly improves update time compared to the state-of-the-art methods. It can maintain labelling of a very small size, while still answering distance queries in the order of milliseconds, even on large-scale graphs with several billions of edges that undergo large batches of updates. The average update times on all these real-world networks are less than one millisecond, and in general, our method is up to 3 orders of magnitude faster than the recent state-of-the-art method BatchHL.

2 Related work

Traditionally, distance queries can be answered using Dijkstra’s search, breadth-first search (BFS), or a bidirectional scheme combining two such searches: one from the source vertex and the other from the destination vertex [33, 36]. However, these algorithms may traverse an entire network when two query vertices are far apart from each other and become too slow for large networks. To accelerate response time in answering distance queries, labelling-based methods have emerged as an attractive way, which precompute a data structure, called distance labelling [1, 4, 6, 10, 11, 13, 19, 22, 24, 28, 39, 40]. For example, Akiba et al. [4] proposed the pruned landmark labelling (PLL) to pre-compute a 2-hop distance labelling [11] by performing a pruned breadth-first search from every vertex, and Li et al. [27] developed a parallel algorithm for constructing PLL which achieved the state-of-the-art results for answering distance queries on static graphs.

Previously, several attempts have been made to study distance queries over dynamic graphs [5, 12, 16, 20, 23, 31, 35, 42] which only considered the unit-update setting, i.e. to perform updates one at a time. Akiba et al. [5] studied the problem of updating PLL for incremental updates (i.e. edge additions). This work, however, does not remove outdated entries because the authors considered it too costly. Qin et al. [35] and D’angelo et al. [12] studied the problem of updating PLL for decremental updates (i.e. edge deletions). Note that, in the decremental case, outdated distance entries have to be removed; otherwise, distance queries cannot be correctly answered. Their methods suffer from high time complexities and cannot scale to large graphs, e.g. the average update time of an edge deletion on a network with 19 M edges is 135 s in [35] and on a network with 16 M edges is 19 s in [12]. D’angelo et al. [12] combined the algorithm for incremental updates proposed in [5] with their method for decremental updates to form a fully dynamic algorithm, which however does not scale beyond networks with around 20 M of edges. Hayashi et al. [23] proposed a fully dynamic method which combines a distance labelling with online search to answer distance queries. Their method pre-computes bit-parallel shortest-path trees (SPTs) rooted at each \(r \in R\) for a small subset of vertices R and dynamically maintain the correctness of these bit-parallel SPTs for every edge insertion and deletion. Then, an online search is performed under an upper distance bound computed via the bit-parallel SPTs on a sparsified graph. Very recently, several methods have attempted to perform updates in the batch-update setting (i.e. to perform multiple updates in a batch) [17, 18]. For instance, BatchHL [18] has shown to be much faster in performing batch updates as compared to the other state-of-the-art methods. In our present work, we further advance the design of BatchHL by studying patterns that were left unexplored in the design of BatchHL.

Table 1 Summary of notations

Another line of research studied streaming graph algorithms. In the streaming setting, a rapidly changing graph is often modelled using certain compressed data structures due to space constraints. Updates are received as a stream, but may be accumulated into batches through a sliding window and applied to the underlying graph. In this setting, a number of methods [21, 30, 32] have been proposed to address distance queries. However, these methods operate under certain constraints, e.g. limited amount of memory and accuracy of graph structure. Different from these streaming graph methods, our work considers applications which operate on batch-dynamic graphs that are explicitly stored and can be processed in the main memory of a single machine. Nevertheless, the ideas of our algorithm can be easily extended to deal with batch updates in the streaming setting.

3 Preliminaries

Without loss of generality, we focus our discussion on unweighted, undirected graphs in this paper and discuss the extension to directed graphs in Sect. 9. Table 1 summarises the frequently used notations.

Let \(G = (V, E)\) be a graph where V is a set of vertices and \(E \subseteq V \times V\) is a set of edges. The distance between two vertices s and t in G, denoted as \(d_G(s, t)\), is the length of a shortest path between s and t. If there does not exist any path between s and t, \(d_G(s, t) = \infty \). We use \(P_{G}(s,t)\) to denote the set of all shortest paths between s and t in G, and N(v) the set of neighbours of a vertex v, i.e. \(N(v) = \{v' \in V | (v, v') \in E \}\).

There are two types of edge updates on graphs: edge insertion and edge deletion. A batch update is a set of edge insertions and deletions. Node insertion or deletion can be treated as a batch update containing only edge insertions or only edge deletions, respectively. In the case that the same edge is being inserted and deleted within one batch update, we simply eliminate both of them. An update is valid if it makes a change on a graph, i.e. inserting an edge (ab) into G when \((a,b)\notin E\), and deleting an edge (ab) from G when \((a,b) \in E\). Without loss of generality, we ignore invalid updates.

Let \(R\subseteq V\) be a subset of special vertices in G, called landmarks. A label L(v) for a vertex v is a set of distance entries \(\{(r_i, \delta _L(r_i, v))\}^n_{i=1}\) where \(r_i \in R\), \(\delta _L(r_i, v) = d_G(r_i, v)\) and \(n\le |R|\). We call \((r_i, \delta _L(r_i, v))\) the \(r_i\)-label of vertex v. A distance labelling over G is the set of labels for all vertices in V. The size of a distance labelling is defined as \(\sum _{v\in V}|L(v)|\). In the literature, a distance labelling is often constructed following the 2-hop cover property [11] which requires at least one common vertex in L(u) and L(v) to be on a shortest path between any two vertices u and v.

Definition 1

(2-hop cover labelling) A distance labelling L over \(G=(V,E)\) is a 2-hop cover labelling if for any \(s, t\in V\),

$$\begin{aligned} d_G(s, t)&= \min \{\delta _L(r_i, s) + \delta _L(r_i, t) \mid \\&\quad (r_i, \delta _L(r_i, s)) \in L(s), (r_i, \delta _L(r_i,t)) \in L(t)\}. \end{aligned}$$

3.1 Highway cover labelling

Our work considers the highway cover labelling [19].

Definition 2

(Highway) A highway \(H=(R, \delta _H)\) consists of a set R of landmarks and a distance decoding function \(\delta _H: R \times R \rightarrow {\mathbb {N}}^+\) s.t. \(\delta _H(r_1, r_2) = d_G(r_1, r_2)\) for any two landmarks \(r_1, r_2\ \in R\).

Definition 3

(Highway cover labelling) A highway cover labelling \(\Gamma =(H, L)\) consists of a highway H and a distance labelling L satisfying that, for any \(v \in V\backslash R\) and \(r\in R\),

$$\begin{aligned} d_G(r, v) = \min \{&\delta _L(r_i, v) + \delta _H(r, r_i) \mid \\&(r_i, \delta _L(r_i, v)) \in L(v)\}. \end{aligned}$$

A highway cover labelling requires that every label L(v) must contain a distance entry to each landmark \(r\in R\) unless there is another landmark on a shortest path between r and v. We shall refer to this distance entry (or its absence) as the r-label of v. Unlike a 2-hop cover labelling that can answer distance queries for any two vertices in a graph, i.e. a full distance labelling, a highway cover labelling can only answer distance queries between any landmark and any vertex in a graph, i.e. a partial distance labelling.

As discussed in the original paper [19], highway cover labelling enjoys several nice theoretical properties, such as minimality and order independence. A highway cover labelling \(\Gamma =(H, L)\) over G is minimal if, for any highway cover labelling \(\Gamma '=(H, L')\) over G, \(size(L')\ge size(L)\) holds. A highway cover labelling \(\Gamma =(H, L)\) over G is order-independent if \(\Gamma \) remains the same, regardless of the order of applying landmarks in R. Given any fixed set of landmarks, there exists a unique minimal highway cover labelling, which is contained in every highway cover labelling [19].

3.2 Query answering

A distance query can be answered via bounding online searches on a sparsified search space based on the highway cover labelling. Specifically, given a highway cover labelling \(\Gamma =(H, L)\), an upper bound on the distance between any pair of vertices \(s, t \in V\) in a graph G is computed as

$$\begin{aligned} d^{\top }_{st} = \min \{&\delta _L(r_i, s) + \delta _H(r_i, r_j) + \delta _L(r_j, t) \mid \\&(r_i, \delta _L(r_i, s)) \in L(s),(r_j, \delta _L(r_j, t)) \in L(t)\}. \end{aligned}$$

Here, \(d^{\top }_{st}\) is the minimal length amongst all paths between s and t that pass through the highway. Since there may exist a shorter path not passing through the highway, we conduct a distance-bounded shortest-path search over a sparsified graph \(G[V \backslash R]\) (i.e. removing all landmarks in R from G) under the upper bound \(d^{\top }_{st}\) to answer the distance query Q(st) such that

$$\begin{aligned} Q(s,t)= \min (d_{G[V \backslash R]}(s,t),\; d^{\top }_{st} ). \end{aligned}$$

In the implementation, \(d_{G[V \backslash R]}(s,t)\) can be computed by conducting a bidirectional BFS search from both s and t [19] which terminates either after \(d^{\top }_{st}-1\) steps or when the searches from both directions meet.

4 BatchHL: basic algorithm

In this section, we start by presenting the basic algorithm of BatchHL, a batch-dynamic method that can efficiently maintain a highway cover labelling for dynamic graphs [18]. Generally, BatchHL involves two phases: Batch Search and Batch Repair, as described in Algorithm 1. Batch Search identifies vertices for which labels may need to be updated, while Batch Repair updates these vertices. These two phases are done independently for each landmark.

figure a

4.1 Batch search

Let \(G=(V,E)\) be a graph, \(R\subseteq V\) a set of landmarks and B a batch update resulting in the updated graph \(G'=(V',E')\). We denote the unique minimal highway labellings on G and \(G'\) by \(\Gamma \) and \(\Gamma '\), respectively.

Definition 4

(P-affected) A vertex \(v\in V\) is path-affected (P-affected) by a batch update B w.r.t. a landmark \(r\in R\) iff \(P_G(r,v)\ne P_{G'}(r,v)\).

We use \(V_{\textsc {aff}}(r, B)=\{v\in V|P_{G}(v, r)\not =P_{G'}(v,r)\}\) to denote the set of all P-affected vertices by a batch update B w.r.t. a landmark r. An edge insertion or deletion (ab) can create or eliminate shortest paths starting from r and passing through (ab). Any update on an edge (ab) with \(d_{G}(r,a) = d_{G}(r,b)\) is trivial w.r.t. a landmark r, since such an update does not affect any vertices w.r.t. the landmark r.

figure b

For a non-trivial update (ab), we call the vertex \(u\in \{a,b\}\) further away from r in G the anchor of (ab), and the other one its pre-anchor. We denote the anchor distance of (ab) by \(d_{G}(r,u) = d_G(r, u') + 1\), where \(u'\) is the pre-anchor of (ab).

As every newly created or eliminated path must pass through an updated edge, we can use anchors to identify P-affected vertices.

Lemma 1

For every P-affected vertex v we have

$$\begin{aligned} d_G(r,v)\ge (d_G(r, u')+1)+d_{G'}(u, v). \end{aligned}$$
(1)

for some non-trivial update \((a,b)\in B\) with anchor u and pre-anchor \(u'\).

We make use of this characterization in Algorithm 2. Specifically, we perform a search with anchors as starting points, compute the minimal value of the right hand side of Eq. 1, and collect all vertices for which Eq. 1 holds.

Example 1

Consider the graph below, where edges marked by \(+\) are inserted and edges marked by − are deleted in one batch update.

figure c

Insertion of (ad) triggers a search starting from the anchor d with the anchor distance \(d_G(r,a)+1\) computed from \(\Gamma \). The search then follows edges in \(G'\), including the newly inserted edge (df), and finds that dcf and v satisfy (1). Insertion of (df) is trivial since d and f are equidistant from r in G, and does not lead to any search. Deletion of (be) triggers a search starting from the anchor e with the anchor distance \(d_G(r,b)+1\) computed from \(\Gamma \).

Combining the searches for (ad) and (be) into one search means that the edge (fv) is traversed only once.

Algorithm 2 eliminates unnecessary search paths by not following vertices violating Eq. 1 and also avoids traversing vertices affected by multiple updates more than once. However, Algorithm 2 does not precisely compute the set of all P-affected vertices, but the set of all vertices satisfying Eq. 1, which is a superset. The following example illustrates why the characterization in Lemma 1 is not precise, and why precise discovery of P-affected vertices is difficult.

Example 2

Consider the graph below, where the dotted edge between r and u indicates a long path between them, and the dotted edge between r and v indicates another long path.

figure d

When both edge deletion (ru) and edge insertion (uv) occur, the distance between r and u in G is used to compute the anchor distance of v for the update (uv), ignoring that the distance between r and u has changed. It is difficult to identify whether v is P-affected—it hinges on whether the long path between r and v is longer (or equal) than the long path between r and u plus 1, which cannot be ascertained by \(\Gamma \).

The set of vertices returned by Algorithm 2 can be precisely characterized as follows.

Definition 5

(Composite path) A path from r to v in \(G\cup G'\) is a composite path iff it consists of two parts: a part that lies in G followed by a part in \(G'\).

A composite path is significant iff it passes through at least one deleted and at least one inserted edge. Consider Example 1 again. \(r-x\) is an insignificant composite path, \(r-x-y\) is a significant composite path, and \(r-x-y-z\) is not a composite path. To see the latter, consider that any inserted edge must belong to the \(G'\) part, and thus later edges would need to lie in \(G'\) as well, i.e. cannot be deleted.

Definition 6

(CP-affected) A vertex v is composite-path-affected (CP-affected) w.r.t. \(r\in R\) iff

  1. (i)

    v is P-affected w.r.t. r, or

  2. (ii)

    there exists a significant composite path from r to v of length \(d_G(r,v)\) or less.

Algorithm 2 returns the set of all CP-affected vertices which includes all P-affected vertices. Additional vertices due to condition (ii) are undesirable but hard to avoid, as illustrated in Example 2. This happens because the starting distance is calculated w.r.t. G, and we are thus effectively considering paths for which the first part (from r to an anchor) lies in G, and the rest in \(G'\). For example, the vertex y in Example 1 is not P-affected but CP-affected, and Algorithm 2 returns it.

Lemma 2

A vertex v is CP-affected w.r.t. r iff there exists a composite path from r to v of length at most \(d_G(r,v)\) that passes through an updated edge.

Proof

We show each direction separately.

  • (if) Let p be a composite path from r to v of length at most \(d_G(r,v)\) that passes through at least one edge in B. If p lies in G then it lies in \(P_G(r,v)\) but not in \(P_{G'}(r,v)\), so v is P-affected. If p lies in \(G'\) then either it lies in \(P_{G'}(r,v)\), or there exists a strictly shorter path \(p'\) in \(P_{G'}(r,v)\). Neither p not \(p'\) lies in \(P_G(r,v)\), so v is P-affected. If p lies neither in G nor in \(G'\) then it must be significant.

  • (only if) Let v be CP-affected. If \(P_G(r,v)\not \subseteq P_{G'}(r,v)\) then there exists a path p in G of length \(d_G(r,v)\) that passes through a deleted edge. If \(P_{G}(r,v)\subsetneq P_{G'}(r,v)\), then there exists a path p in \(G'\) of length at most \(d_G(r,v)\) that passes through an inserted edge. Otherwise, there exists a significant composite path of length at most \(d_G(r,v)\), which passes through edges in B by definition.

\(\square \)

Lemma 3

Algorithm 2 returns the set of all CP-affected vertices.

Proof

We show that a vertex v is added to \(V_\textsc {aff+}\) iff there exists a composite path p of length at most \(d_G(r,v)\) that passes through an updated edge.

  • (if) Let \(v\in V_\textsc {aff+}\). Vertices are only added to \({\mathcal {Q}}\) and thus to \(V_\textsc {aff+}\) if they pass one of three distance checks. In each case, this compares the length of a composite path passing through an updated edge to the distance from r in G.

  • (only if) Let (ab) be either the last deleted edge that p passes through, or the first inserted edge, with \(d_G(r,a) < d_G(r,b)\). Then, p can be split into \(p_{ra}\) from r to a, (ab) and \(p_{bv}\) from b to v such that \(p_{ra}\) lies in G and \(p_{bv}\) in \(G'\). The search in Algorithm 2 starting at b will use \(|p_{rb}| = d_G(r,a) + 1\) as distance bound for b, and proceed along \(p_{bv}\). Thus, for every vertex \(w\in p_{bv}\), including v, it will obtain \(|p_{rw}| \le d_G(r,w)\) as distance bound for w, and add w to \(V_\textsc {aff+}\).

\(\square \)

In particular, it follows that (1) characterizes CP-affected vertices, rather than P-affected ones.

4.2 Batch repair

Algorithm 3 describes the approach for repairing the labels of affected vertices returned by Algorithm 2. At its core, the algorithm starts with boundary vertices that lie on the boundary of affected and unaffected vertices, and whose distances to r are computed from neighbouring vertices whose distance did not change. Importantly, even though a vertex may be affected by multiple edge updates in a batch, its r-label only needs to be updated once.

Definition 7

(Boundary vertex) A vertex v is a boundary vertex w.r.t. a landmark r if v is an affected vertex (for some version of “affected”, e.g. CP-affected) but has an unaffected neighbour w.

Denote by \(V_\textsc {aff+}\) the set of affected vertices, and let \(v\in V_\textsc {aff+}\). For every neighbour w of v in \(G'\), \(d_{G'}(r,v)\) must be upper-bounded by \(d_{G'}(r,w)+1\). If such a neighbour lies outside of \(V_\textsc {aff+}\), the value of \(d_{G'}(r,w)=d_G(r,w)\) can easily be obtained – while we employ different variants of “affected” throughout the paper, “unaffected” vertices will always retain their distance to r. By taking the minimum of such known upper bounds, we get a readily available distance bound for v.

Definition 8

(Distance bound) Let \(S\subset V\setminus \{r\}\) be a set of vertices. The distance bound of v w.r.t. S is:

$$\begin{aligned} d_\textsc {bou}(v,S):= \min \{ d_{G'}(r,w) + 1 \mid w \in N_{G'}(v)\setminus S \}. \end{aligned}$$

The following lemma allows us to compute the distance of vertices in \(V_\textsc {aff+}\) from r in \(G'\) using their distance bounds.

Lemma 4

Let \(S\subset V{\setminus }\{r\}\) and \(v\in S\) with minimal distance bound. Then \(d_{G'}(r,v)=d_\textsc {bou}(v,S)\).

Proof

For \(d_{G'}(r,v)=\infty \) this is trivial. Otherwise, let p be a shortest-path from r to v in \(G'\), \(v'\) be the first vertex in p that lies in S, and w its predecessor in p. Since \(w\notin S\) we have \(d_\textsc {bou}(v', S) \le d_{G'}(r,w) + 1 = d_{G'}(r,v')\). If \(v'\ne v\) then \(d_{G'}(r,v') < d_{G'}(r,v) \le d_\textsc {bou}(v,S)\), and therefore \(d_\textsc {bou}(v',S) < d_\textsc {bou}(v,S)\). This contradicts the minimality of \(d_\textsc {bou}(v,S)\), so \(v'=v\). It follows that \(d_\textsc {bou}(v, S) = d_{G'}(r,v)\). \(\square \)

Note that \(d_{G'}(r,v)=d_\textsc {bou}(r,S)\) does not generally hold for every boundary vertex v. Based on Lemma 4, we repair labels by starting with boundary vertices that have the smallest distance bound. After each iteration, we treat affected vertices with repaired labels as being unaffected and find boundary vertices that have the smallest distance bound again. This process terminates only when the labels of all affected vertices are repaired.

figure e

Algorithm 3 shows the pseudo-code of the batch repair algorithm used by BatchHL. Given a graph \(G'\) and a set of all affected vertices \(V_\textsc {aff+}\), we first compute the distance bounds of vertices in \(V_\textsc {aff+}\) using their unaffected neighbours. We then find vertices in \(V_\textsc {aff+}\) with the minimal distance bounds and remove them from \(V_\textsc {aff+}\). By Lemma 4, their distances to r in \(G'\) equal their distance bounds. For each \(v \in V_{\min }\), we set the r-label of v to \((r_i, D_\textsc {bou}[v])\) (Line 8) and assign new distances to the affected children of v in \(V_\textsc {aff+}\) (Lines 9-10). We continue this process until \(V_{\textsc {aff}}\) is empty.

Note that Algorithm 3 does not eliminate redundant r-labels. We will address this issue in Sect. 6.2.

Remark 1

We note that the work in [18] has proposed several optimization techniques to reduce the set of CP-affected vertices returned by Algorithm 2 and can thus efficiently repair the affected vertex set. These optimization techniques have been shown to significantly improve the performance of updating labelling. Thus, in our experiments (Sect. 10), we will compare the performance of our proposed method BatchHL\(^+\) against the method of BatchHL that has incorporated these optimization techniques.

5 Separate algorithms—insertion and deletion

We observe that changes to shortest paths between r and v do not always cause a change in distance. Based on this observation, we shall differentiate between new and eliminated paths, and strengthen the pruning condition \(d+1\le d_G(r,w)\) in Line 12 of Algorithm 2 to \(d+1 < d_G(r,w)\) for new paths. For eliminated paths, we examine all neighbouring vertices to ensure that all shortest paths have been eliminated.

However, things get a little trickier as we may need to eliminate redundant labels, or restore previously eliminated labels when they become non-redundant. Even if the distance between r and v does not change, the highway labelling may need to be updated.

Example 3

Consider the following graphs and updates, where the landmarks are circled. In all cases, vertex v is affected, but the distance between r and v does not change. For case (a) adding the edge (bv) does not cause a label change for v. It does however for case (b) where b is a landmark, causing the r-label of v to be deleted. Deletion of (bv) does not cause a change on the label of v in case (c), but causes a change in case (d) where an r-label needs to be inserted.

figure f

A core difficulty in identifying whether affected vertices have changes on their labels is that label changes can happen far away from updates, and computing the changed labels of such vertices may require the consideration of vertices whose labels do not change due to being redundant, as illustrated by the example below.

Example 4

Consider the graph below, where r and b are landmarks and the edge (rb) is deleted.

figure g

The distance between r and c changes, but the label of c does not change. That is because the shortest path between r and c goes through the landmark b without changes. At the same time the label of v does change, as the edge (rb) eliminates a shortest path between r and v that passes through the landmark b, similar to case (d) in Example 3. Although the label of c does not change, the changed distance between r and c is needed for computing the changed label of v. Therefore, c needs to be captured as well.

We thus want to return (at least) all those vertices for which either label or distance changes.

Definition 9

(LD-affected) A vertex v is landmark-distance-affected (LD-affected) by a batch update B w.r.t. a landmark \(r\in R\) iff there is a

  1. (i)

    label-change: the r-label of v changes, or

  2. (ii)

    distance-change: distance between r and v changes.

As seen in Example 3, changes to r-label without changes to distance happen whenever a new shortest path passing through another landmark is created where none existed previously, or when the last such path is deleted. To identify such cases, we track whether a shortest path to r passes through another landmark.

Definition 10

(Landmark length) The landmark length of a path p starting from \(r\in R\) is a tuple \((d,l)\) \(\in {\mathbb {N}}\times {\mathbb {B}}\) where

  • d is the length of p (number of edges), and

  • \(l\) is the landmark flag, with \(l=\text { True}\) iff p passes through a landmark other than r.

We denoted this landmark length as \(|{p}|_\textsc {l}\). The landmark distance between r and v in G is the minimal landmark length of paths between them, denoted as

$$\begin{aligned} d^L_G(r,v):=\min \big \{|{p}|_\textsc {l} \mid p \text {is a path between} r \text {and} v \text {in} G\big \} \end{aligned}$$

The ordering used to compare landmark length tuples is the lexicographical one, with \(\text { True}< \text { False}\). The latter ensures that the landmark flag of \(d^L_G(r,v)\) is set iff any of the shortest paths between r and v passes through another landmark.

Lemma 5

Let \(d^L_{G'}(r,v)=(d,l)\). If \(d=\infty \) or \(l=\text { True}\), then v has no r-label in \(\Gamma '\). Otherwise, v has the r-label (rd).

Proof

If v has any r-label in \(\Gamma '\) it must be (rd). As \(\Gamma '\) is minimal, this r-label exists iff it is not redundant. For \(d=\infty \) redundancy of \((\infty , r)\) is obvious. Otherwise (dr) is redundant iff the correct distance could also be computed using the highway. This happens iff a shortest path between r and v passes through another landmark, which is indicated by the landmark flag. \(\square \)

Lemma 6

A vertex v is LD-affected iff it satisfies:

$$\begin{aligned} d^L_G(r,v)\ne d^L_{G'}(r,v). \end{aligned}$$

Proof

Let \(l_G\) and \(l_{G'}\) denote the landmark flags of \(d^L_G(r,v)\) and \(d^L_{G'}(r,v)\), respectively. Condition (ii) of Definition 9 states \(d_G(r,v)\ne d_{G'}(r,v)\). It suffices to show that for \(d_G(r,v)=d_{G'}(r,v)\) condition (i) holds iff \(l_G\ne l_{G'}\). This is trivial for \(d_G(r,v)=d_{G'}(r,v)=\infty \). For finite distances, it follows from Lemma 5. \(\square \)

5.1 Insertion-only batch search

To identify LD-affected vertices in the case of edge insertions, for an inserted edge (ab), we need to compute the minimal landmark length of paths from r to b passing through a as an upper bound for the landmark distance of b. Since this may lead to frequently updating the landmark length of a path when appending another vertex, we define an operator for this:

$$\begin{aligned} (d,l) \oplus w:= {\left\{ \begin{array}{ll} (d + 1, \text { True}) &{} \text {if}\, w\, \text {is a landmark,} \\ (d + 1, l) &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
figure h

Batch search for insertions is described in Algorithm 4. Its correctness is straightforward.

Lemma 7

Algorithm 4 returns the set of LD-affected vertices.

Proof

Any vertex added to \(V_\textsc {ld-aff}\) must be LD-affected, as its landmark distance in \(G'\) is lower than in G.

Conversely, if v is LD-affected, there must exist a strictly shorter path p in \(G'\). Let p be the shortest such path, b be the first LD-affected vertex in p and a its predecessor. Then the edge (ab) must be inserted, and all vertices from b onwards must be LD-affected. As vertices are processed in order of distance, the computed distance bound will be minimal, causing b and all later vertices in p to be added to \(V_\textsc {ld-aff}\). \(\square \)

5.2 Deletion-only batch search

The key idea for identifying an LD-affected vertex v in the case of edge deletions is to examine its neighbourhood. While this will not always give us its landmark distance in \(G'\) as neighbours may not have been updated yet, it does provide a lower bound that is sufficient for determining LD-affectedness. In case v turns out to be LD-affected, we will need to examine its neighbours anyways to identify other LD-affected vertices, and some of the required computations can be reused.

To identify a vertex v as LD-affected, we use two criteria:

  1. 1.

    There must exist a path between r and v in G of length \(d^L_G(r,v)\) that passes through a deleted edge.

  2. 2.

    There must not exist a path between r and v in \(G'\) of length \(d^L_G(r,v)\).

The first criterion provides us with candidates to examine further, while the second criterion is used to confirm or reject candidates. However, simply storing whether a vertex is LD-affected is not enough for this purpose.

Example 5

Consider the graph below, where r, b and v are landmarks and edge (bc) is getting deleted.

figure i

Here, c is LD-affected as its landmark distance reduces from (2, L) to \((2,-)\). However, v is not LD-affected as its landmark distance remains as (3, L). Deciding the latter is impossible if all we know is that its only neighbour is LD-affected.

While the issue could be solved by storing the new landmark distance for LD-affected vertices, we do not have this information available during batch search (e.g. for vertex a in Example 6). However, we can decide whether a path of equal length—but not passing through another landmark—still exists in \(G'\), and storing this information is sufficient for deciding LD-affectedness of other landmarks.

Another key observation concerns the neighbours of v through which a path violating the second criterion (path in \(G'\) of equal length) might pass. By processing vertices in order of their distance from r in G, we ensure that these will already have been processed.

Example 6

Consider the graph below, where edges (ra) and (bv) are getting deleted.

figure j

To confirm that v is LD-affected, we examine its neighbours a and c in \(G'\). Here a has already been processed and marked as LD-affected, as it is closer to r than v. Vertex c may not have been processed yet, and thus may wrongly suggest a path between r and v of length 3 in \(G'\). However, this is irrelevant for deciding whether a path of length \(d_G(r,v)\) or \(d^L_G(r,v)\) exists in \(G'\).

figure k

The resulting algorithm for batch search in case of deletion is given as Algorithm 5. Here \(V_\textsc {d-aff}\) contains distance-affected vertices. We show its correctness next.

Lemma 8

Algorithm 5 returns the set of LD-affected vertices.

Proof

Let v be added to \(V_\textsc {ld-aff}\). This only happens if \((d_v,l_v,v)\) was previously added to \({\mathcal {Q}}\), at which point it was checked that \((d_v,l_v)\) equals the landmark distance \(d^L_G(r,v)\). Additionally, line 16 must not have been reached, meaning the checks for a path in \(G'\) of landmark length that equal to \((d_v,l_v)\) must have failed. For the correctness of this check, consider that any such path would need to pass through some neighbour of v in \(G'\), and this neighbour would need to be closer to r and not distance-affected. It follows that \(G'\) contains no path from r to v of landmark length \(d^L_G(r,v)\), meaning that v is LD-affected.

Conversely, let v be LD-affected and p be some shortest path from r to v in G w.r.t. landmark length. As p does not lie in \(G'\), it must pass through one or more deleted edges. Let (ab) be the deleted edge in p closest to v, with b closer to v than a. Then, all vertices in p between b and v must be LD-affected. By induction we may assume that all of them except v will be added to \(V_\textsc {ld-aff}\). As all neighbours of vertices in \(V_\textsc {ld-aff}\) are checked for being LD-affected, v will be checked as well and added to \(V_\textsc {ld-aff}\). \(\square \)

Remark 2

Both insertion-only and deletion-only batch search algorithms can compute LD-affected vertices when they handle only edge insertions or only edge deletions. However, when applying these two algorithms together on batch updates with mixed inserted and deleted edges, e.g. applying insertion-only algorithm on edge insertions first and then deletion-only algorithm on edge deletions, or the reversed order, they fail to compute the exact set of LD-affected vertices. The next section will discuss the reasons in detail, which motivates our development of an improved algorithm BatchHL\(^{+}\).

6 BatchHL\(^{+}\): improved algorithm

In principle, one could handle any batch update consisting of both insertions and deletions by processing insertions first and deletions after, or vice versa. However, the issue with such an approach is that vertices may be affected by both insertions and deletion, causing them to be processed twice. It is even possible that insertions and deletions cancel each other out so that vertices are processed twice even though they would not need to be processed at all.

Example 7

Consider the following graphs and updates. When considering insertions and deletions together, v is not LD-affected in either of the three cases. However, for cases (a) and (b), processing insertions first and deletions after will cause v to be LD-affected at both times. Similarly, for case (c), processing deletions first and insertions after will cause v to be LD-affected at both times.

figure l

While it is challenging to identify all cases where insertions and deletions cancel each other out, we will be able to avoid some of them. For example, we can avoid returning v for cases (b) and (c) in Example 7, but not for case (a). For case (a), the difficulty lies in not knowing the distance between r and a in the changed graph \(G'\). As illustrated in Example 6, computing this during batch search is tricky.

Even though we will present an improved algorithm for processing mixed batches in the next section, it may be worth pointing out a small trick when splitting a batch B into insertions \(B^+\) and deletions \(B^-\). Instead of running Algorithm 4 on the graph obtained by applying \(B^+\) to G, we can run it on the graph obtained by apply the whole batch B. This prevents us from following paths that pass through a deleted edge after first passing through an inserted edge, as depicted in case (b) of Example 7.

Clearly, there are still cases where this trick cannot stop an unaffected vertex from getting flagged as affected. Case (a) of Example 6 was one such case. There are other cases as shown in the example below.

Example 8

Consider the following graph and updates.

figure m

Here, v is not LD-affected but still returned by Algorithm 4, even with all of B used in constructing \(G'\).

The improved algorithm presented next will avoid returning v, as the insertion of (av) will only be processed after the deletion of (ra) has already been considered. Here the existence of the alternate path \(r-b-v\), which has the same length as \(r-a-v\), is crucial in avoiding the difficulties illustrated in Example 6. By the time v is being processed, we can be sure that b is unaffected and can thus conclude that v is unaffected.

6.1 Improved batch search

To combine Algorithms 4 and 5, consider their search process. Both algorithms track the lengths of paths \(r\dots a-b\dots v\) passing through an updated edge (ab). As the distance between r and a is computed using the highway labelling \(\Gamma \), and edges used to extend the paths lie in \(G'\), we are effectively considering paths that lie partially in G and partially in \(G'\). Therefore, we define the notion of composite distance.

Definition 11

(Composite distance) The composite distance between r and v over \((G,G')\) is the length of the shortest composite paths between them. We denote it as

$$\begin{aligned} d_\textsc {c}(r,v):=\min \big \{|p| \mid \text { { p} is composite path from { r} to { v}}\big \} \end{aligned}$$

Example 9

Consider again the cases from Example 7. Here \(r-a-v\) is a composite path for case (a) but not for case (b). The composite distance between r and v is 2 for case (a) and \(\infty \) for case (b).

figure n
figure o

Our improved algorithm is presented as Algorithm 6. A detailed discussion is deferred until Sect. 8, where we prove that the vertex set identified by it can be characterized as follows.

Definition 12

(LCD-affected) A vertex v is landmark-composite-distance-affected (LCD-affected) w.r.t. a landmark r iff one of the following conditions is satisfied:

  • Landmark distance changes: \(d^L_{G'}(r,v) \ne d^L_G(r,v)\);

  • Composite distance changes: \(d_\textsc {c}(r,v) \ne d_G(r,v)\).

Clearly, this includes all LD-affected vertices, but may also include others, e.g. case (a) in Example 7, which is undesirable. The reason for those additional vertices is that we cannot easily compute \(d^L_{G'}(r,v)\) during batch search. However, we can compute \(d_\textsc {c}(r,v)\) and decide whether a path of length \(d_\textsc {c}(r,v)\) exists in \(G'\), essentially the same way as this was done in Algorithm 5. We call vertices where such a path exists stable, and use this to resolve situations similar to Example 8.

We note that whenever B contains only inserted or only deleted edges, vertices are LCD-affected iff they are LD-affected. That is because edge deletions do not change the composite distance, while the absence of edge deletions means that a change in composite distance implies a change in landmark distance.

6.2 Improved batch repair

Now, we introduce improved batch repair algorithm to repair LCD-affected vertices \(V_\textsc {aff+}\) returned by Algorithm 6. To eliminate redundant r-labels, we need to track landmark distance, which requires the following notion of landmark distance bound.

Definition 13

(Landmark distance bound) Let \(S\subset V\setminus \{r\}\) be a set of vertices. The landmark distance bound of vertex v w.r.t. S is:

$$\begin{aligned} d^L_\textsc {bou}(v,S):= \min \{ d^L_{G'}(r,w)\oplus v \mid w \in N_{G'}(v)\setminus S \}. \end{aligned}$$

The following lemma allows us to compute the landmark distances of vertices in \(V_\textsc {aff+}\) from a landmark r in the changed graph \(G'\) using their landmark distance bounds.

Lemma 9

Let \(S\subset V{\setminus }\{r\}\) and \(v\in S\) with minimal landmark distance bound. Then \(d^L_{G'}(r,v)=d^L_\textsc {bou}(v,S)\).

Proof

For \(d_{G'}(r,v)=\infty \) this is trivial. Otherwise, let p be a shortest-path from r to v in \(G'\) w.r.t. landmark length, \(v'\) the first vertex in p that lies in S, and w its predecessor in p. Since \(w\notin S\), we have \(d^L_\textsc {bou}(v',S) \le d^L_{G'}(r,w) \oplus v = d^L_{G'}(r,v')\). If \(v'\ne v\) then \(d_{G'}(r,v') < d_{G'}(r,v) \le d_\textsc {bou}(v,S)\), and thus \(d_\textsc {bou}(v',S) < d_\textsc {bou}(v,S)\). This contradicts the minimality of \(d_\textsc {bou}(v,S)\), so \(v'=v\). It follows that \(d^L_\textsc {bou}(v,S) = d^L_{G'}(r,v)\). \(\square \)

Note that again \(d^L_{G'}(r,v)=d^L_\textsc {bou}(r,S)\) does not generally hold for every boundary vertex v.

figure p

Algorithm 7 shows the pseudo-code of our improved batch repair algorithm. For a graph \(G'\) and a set of all affected vertices \(V_\textsc {aff+}\), we first compute the landmark distance bounds of vertices in \(V_\textsc {aff+}\) using their unaffected neighbours. We then find vertices in \(V_\textsc {aff+}\) with minimal distance bounds and remove them from \(V_\textsc {aff+}\). By Lemma 9 their landmark distances to r in \(G'\) equal their landmark distance bounds. We use these landmark distances to update their r-labels, as well as their highway distances in the case of landmarks. Finally, we update the landmark distance bounds of neighbouring vertices in \(V_\textsc {aff+}\). We continue this process until \(V_\textsc {aff+}\) is empty.

6.3 An illustrative example

The following example illustrates the individual steps which our improved algorithm BatchHL\(^+\) runs through.

Example 10

Consider the following graph and updates:

figure q

The initial highway labelling \(\Gamma =(H,L)\) looks like this:

figure r

BatchHL\(^+\) initializes \(\Gamma '\) as \(\Gamma \) and then runs BatchSearch (Algorithm 6) and BatchRepair (Algorithm 7) for both landmarks \(r_1\) and \(r_2\).

For landmark \(r_1\), Algorithm 6 returns the following set of affected vertices:

$$\begin{aligned} V_\textsc {aff+}=\{ f, g, h \} \end{aligned}$$

Due to the inserted edge \((a,r_2)\), the new paths from a to \(r_2\), d and i have the same landmark length as existing ones and are thus pruned. The eliminated path \(r_1-f-g-h-i\) has strictly greater landmark length than the existing path through \(r_2\) and is also pruned. For vertex e, the stable path \(r_1-b-e\) ensures us that e is not LCD-affected.

For comparison, the BatchSearch of BatchHL described by Algorithm 2 would return the following larger set of vertices:

$$\begin{aligned} V_\textsc {aff+}=\{ r_2, d, e, f, g, h, i \} \end{aligned}$$

Note that, here, vertex e is not P-affected, but still returned due to the composite path \(r_1-f-e\).

Now, we run Algorithm 7 on \(V_\textsc {aff+}=\{f,g,h\}\). The initial landmark distance bounds for this set are

$$\begin{aligned} d^L_\textsc {bou}(r_1,\ldots ) = \begin{array}{c|c|c} f &{} g &{} h \\ \hline (3,\text { False}) &{} (3,\text { True}) &{} (5,\text { True}) \end{array} \end{aligned}$$

In the first iteration f and g have minimal distance bounds. Thus, the \(r_1\)-label in \(L'(f)\) is updated to \((r_1,3)\) and the \(r_1\)-label in \(L'(g)\) is removed. After that, \(d^L_\textsc {bou}(r_1,h)\) is updated to \((4,\text { True})\) and the \(r_1\)-label in \(L'(h)\) is removed. This leaves \(L'\) as

figure s

For landmark \(r_2\), Algorithm 6 and Algorithm 2 return \(V_\textsc {aff+}= \{ a, e \}\) and \(V_\textsc {aff+}= \{ r_1, a, b, e \}\), respectively.

Then, running Algorithm 7 on \(V_\textsc {aff+}= \{ a, e \}\) inserts \((r_2,1)\) into \(L'(a)\) and \((r_2,2)\) into \(L'(e)\) for the updated highway labelling as

figure t

7 Comparison of batch search algorithms

In this section, we provide a brief theoretical comparison between different batch search algorithms, including the algorithms used in BatchHL (Sect. 4), the algorithms of handling insertions and deletions separately (Sect. 5), and the improved algorithm used in BatchHL\(^{+}\) (Sect. 6). Recall that, ideally, we would only want to return vertices that are LD-affected w.r.t. an entire batch update, but all these algorithms fail at that and return additional vertices.

Figure 2 summarizes the relationships between different concepts defined and vertex sets returned by these algorithms. Here, BatchHL was originally introduced in [18], \(\text {BHL}^\text {I+D}\) and \(\text {BHL}^\text {D+I}\) denote the sequential applications of insertions before deletions or deletions before insertions as described by Algorithms 4 and 5, and \(\text {BHL}^\text {I+D}_\text {trick}\) sequential application with the trick described in Sect. 5, and BatchHL\(^+\) refers to our improved algorithm as described by Algorithms 6 and 7 in Sect. 6.

Fig. 2
figure 4

Relationships between “affected” vertex sets defined and/or returned by different batch search algorithms. Each arrow indicates a subset containment

Figure 3 illustrates cases that differentiate the vertex set definitions and algorithm outputs. For each of these cases, Table 2 lists whether vertex v can be identified by different “affected” notions and algorithms.

Fig. 3
figure 5

Graphs illustrating differences between “affected” vertex sets defined and/or returned by batch search algorithms

Table 2 Containment of vertex v in Fig. 3 in the vertex sets of different batch search algorithms

We note that case (g) in Fig. 3 is designed to trigger the corner case of Algorithm BatchHL, described in the proof of Lemma 5.18 in [18], where the search follows a path whose extended landmark length is not minimal. This corner case prevents \(\text {BHL}^\text {D+I}\) from always returning a subset of BatchHL.

In the following, we denote by \(V_{I+D}\) the set of vertices returned by either Algorithm 4 or 5 if insertions are processed first, and by \(V_{D+I}\) if deletions are processed first. We note that \(V_{D+I}\) does not need to contain all LCD-affected vertices, as evidenced by case (a) of Example 7, where v is LCD-affected but not returned. However, it may also contain vertices that are not LCD-affected, as seen in case (c) of Example 7, where v is returned but it is not LCD-affected. Thus we can only compare it against Algorithm 6 experimentally. However, we can easily establish a relationship between \(V_{D+I}\) and the set of P-affected vertices as in Definition 4.

Lemma 10

\(V_{D+I}\) contains only P-affected vertices.

Proof

If v is returned by Algorithm 5, then by Lemma 8v is LD-affected w.r.t. deletions. Thus, a shortest path in G is eliminated, and v is P-affected.

If v is not returned by Algorithm 5 but returned by Algorithm 4, then by Lemma 7v is LD-affected w.r.t. insertions into the intermediate graph \(G^-\), the result of applying edge deletions to G. Thus the landmark distance between r and v is strictly smaller in \(G'\) than it is in \(G^-\). Since v was not returned by Algorithm 5, the landmark distance between r and v in G is the same as in \(G^-\). Thus, v is LD-affected and P-affected w.r.t. the combined batch update. \(\square \)

For \(V_{I+D}\), the situation is nonetheless different. As seen in case (b) of Example 7, \(V_{I+D}\) may contain vertices that are not LCD-affected. However, as we will show below, it will never return fewer.

Lemma 11

\(V_{I+D}\) contains all LCD-affected vertices.

Proof

Let v be LCD-affected. Definition 12 requires

$$\begin{aligned} d_\textsc {c}(r,v) < d_G(r,v) \quad \text {or}\quad d^L_{G'}(r,v) \ne d^L_G(r,v) \end{aligned}$$

If \(d_\textsc {c}(r,v) < d_G(r,v)\) then v is LD-affected w.r.t. insertions only, and thus returned by Algorithm 4. If \(d^L_{G'}(r,v) \ne d^L_G(r,v)\) then v is LD-affected w.r.t. the combined batch update, and thus returned. \(\square \)

We note that Lemma 11 still holds if the final graph \(G'\) is passed to Algorithm 4 (the “trick” mentioned in Sect. 5). This can be seen by considering that for \(d_\textsc {c}(r,v) < d_G(r,v)\) the algorithm will use the last inserted edge in a shortest composite path as a starting point, and by Definition 5 all subsequent edges must lie in \(G'\). Example 8 shows that non-LCD-affected vertices may still be returned.

8 Theoretical discussion

8.1 Proof of correctness

Now we prove that Algorithm 6 returns precisely the set of LCD-affected vertices. As illustrated in Example 8, we are particularly interested in vertices for which the distance to r in \(G'\) is precisely the composite distance. In these cases we can compute the landmark distance to r in \(G'\), which enables us to determine with certainty whether or not its landmark distance changes.

Definition 14

(Stable path and stable vertex) We call a path from r to v stable iff it lies in \(G'\) and has length \(d_\textsc {c}(r,v)\). We call v stable w.r.t. r iff such a stable path exists, i.e. iff \(d_{G'}(r,v) = d_\textsc {c}(r,v)\).

As mentioned before, we can determine the updated landmark distance of stable vertices during batch search. For others we can only determine the composite distance. This motivates the following definition.

Definition 15

(Composite landmark distance) Given a composite path p, the composite landmark length of p is its landmark length if it lies in \(G'\), and \((|p|,\text { False})\) otherwise. Accordingly, the composite landmark distance of a vertex is defined as

$$\begin{aligned} d^L_\textsc {c}(r,v):= {\left\{ \begin{array}{ll} d^L_{G'}(r,v) &{} \text {if}\,v\,\text {is stable;} \\ (d_\textsc {c}(r,v), \text { False}) &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Definition 16

(Stable landmark distance) The stable landmark distance additionally stores whether v is stable:

$$\begin{aligned} d^S_\textsc {c}(r,v):= {\left\{ \begin{array}{ll} (d^L_{G'}(r,v), \text { True}) &{} \text {if}\, v\, \text {is stable;} \\ (d_\textsc {c}(r,v), \text { False}, \text { False}) &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

The distance component of \(d^L_\textsc {c}(r,v)\) is \(d_\textsc {c}(r,v)\) in both cases, while the landmark flag indicates the existence of a stable path that passes through another landmark. When comparing stable landmark length values, we use lexicographical ordering with \(\text { True}< \text { False}\), as for landmark length. This is needed to ensure that stable paths are processed first, so that the stable variable is set correctly if the check in line 12 fails in Algorithm 6.

We first prove the following lemma.

Lemma 12

A vertex v is LCD-affected iff

  1. (i)

    v is unstable, or

  2. (ii)

    \(d^L_\textsc {c}(r,v)\ne d^L_G(r,v)\)

Proof

We prove the “if” and “only if” below.

  • (if) Let v be unstable, meaning \(d_\textsc {c}(r,v)\ne d_{G'}(r,v)\). If \(d_\textsc {c}(r,v) \ne d_G(r,v)\) then its composite distance changes. If \(d_\textsc {c}(r,v)=d_G(r,v)\) then \(d_{G'}(r,v)\ne \) \( d_G(r,v)\) and its landmark distance changes. Otherwise v must be stable and \(d^L_\textsc {c}(r,v)\ne d^L_G(r,v)\). This means \(d^L_{G'}(r,v)\) \(\ne d^L_G(r,v)\) and its landmark distance changes.

  • (only if) Let v be LCD-affected but stable so that \(d^L_\textsc {c}(r,v)=d^L_{G'}(r,v)\). If \(d^L_{G'}(r,v)\ne d^L_G(r,v)\) then \(d^L_\textsc {c}(r,v)\ne d^L_G(r,v)\) follows. Otherwise \(d_\textsc {c}(r,v)\ne d_G(r,v)\) and \(d^L_\textsc {c}(r,v)\ne d^L_G(r,v)\) follows.

\(\square \)

Note that together with the landmark distance, the stable landmark distance provides precisely the information needed to check the conditions of Lemma 12.

The following terminology will be useful for identifying paths through which vertices are visited by Algorithm 6. In particular, this is restricted by the checks in lines 26 and 31 of Algorithm 6.

Definition 17

(LCD-parent) We call v an LCD-parent of w, and w an LCD-child of v, iff

  1. (a)

    v and w are LCD-affected, and

  2. (b)

    w is a neighbour of v in \(G'\), and

  3. (c)

    \(d^L_\textsc {c}(r,w)= {\left\{ \begin{array}{ll} d^L_\textsc {c}(r,v)\oplus w &{} \text {if}\, v\, \text {is stable,} \\ (d_\textsc {c}(r,v)+1,\text { False}) &{} \text {otherwise, and} \end{array}\right. }\)

  4. (d)

    \(d_\textsc {c}(r,v) + 1 < d_G(r,w)\), or \(d^L_{G'}(r,v)\oplus w < d^L_G(r,w)\), or \(d^L_G(r,v)\oplus w = d^L_G(r,w) < d^L_{G'}(r,w)\).

We use the terms LCD-ancestor and LCD-descendant to denote the reflexive and transitive closures of the LCD-parent and LCD-child relationships.

Note that conditions (2) and (3) mean that v is the predecessor of w on a path from r to w of minimal composite landmark length. The conditions of (4) correspond precisely to those of Definition 12: either a path through v reduces the composite distance to w, or it reduces the landmark distance to w, or the landmark distance of w increases due to elimination of a shortest path passing through v.

Example 11

Consider the following graphs and updates.

figure u

For the graph on the left, the LCD-ancestors of v are ac and v itself. Here, b is not an LCD-parent of c as the conditions of (4) are violated: \(d^L_G(r,c) < d^L_{G'}(r,c)\) but \(d^L_G(r,b)\oplus c \ne d^L_G(r,c)\). For the graph on the right, the LCD-ancestors of v are again ac and v itself. b is not an LCD-parent of c as the conditions of (3) are violated.

The following lemma shows that all LCD-affected vertices can be traced back to a vertex initially enqueued in Algorithm 6.

Lemma 13

Let b be LCD-affected with no LCD-parent. Then, there exists a vertex a such that either

  1. (a)

    edge (ab) is inserted, \(d^L_G(r,a)\oplus b = d^L_\textsc {c}(r,b) < d^L_G(r,b)\), and a is not LCD-affected, or

  2. (b)

    edge (ab) is deleted, \(d^L_G(r,a)\oplus b = d^L_G(r,b)\), and \((d_G(r,b),\text { False}) = d^L_\textsc {c}(r,b)\).

Proof

Since b is LCD-affected, Definition 12 requires

  • \(d_\textsc {c}(r,b) < d_G(r,b)\) or

  • \(d^L_{G'}(r,b) < d^L_G(r,b)\) or

  • \(d^L_{G'}(r,b) > d^L_G(r,b)\).

As b has no LCD-parents, every vertex a must violate one of the conditions (1)-(4) of Definition 17.

(1) Assume \(d_\textsc {c}(r,b) < d_G(r,b)\). Let p be a composite path from r to b of minimal composite landmark length, and a the predecessor of b in p. Then, \(d_\textsc {c}(r,a)+1 = d_\textsc {c}(r,b)\), so condition (4) holds. If a is stable, then \(d^L_\textsc {c}(r,a)\oplus b = d^L_\textsc {c}(r,b)\); otherwise, the landmark flag of \(d^L_\textsc {c}(r,b)\) is \(\text { False}\) by Definition 15. Thus, condition (3) holds in both cases. Condition (2) must hold as well—otherwise the last edge (ab) in the composite path p is a deleted edge, which implies that p lies in G, contradicting \(d_\textsc {c}(r,b) < d_G(r,b)\). This only leaves condition (1) to be violated, so a cannot be LCD-affected. Thus, \(d^L_G(r,a)\oplus b = d^L_\textsc {c}(r,a)\oplus b = d^L_\textsc {c}(r,b)\), and therefore \(d_G(r,a)+1 < d_G(r,b)\). It follows that (ab) must be inserted and \(d^L_G(r,a)\oplus b = d^L_\textsc {c}(r,b) < d^L_G(r,b)\) holds.

(2) Assume \(d_\textsc {c}(r,b) = d_G(r,b)\) and \(d^L_{G'}(r,b) < d^L_G(r,b)\). Then, \(d_\textsc {c}(r,b)\le d_{G'}(r,b)\le d_G(r,b)=d_\textsc {c}(r,b)\) so b is stable. Let p be a stable path from r to b of minimal landmark length, and a the predecessor of b in p. Then, conditions (2) and (3) clearly hold. Since \(d^L_{G'}(r,a)\oplus b = d^L_{G'}(r,b) < d^L_G(r,b)\) condition (4) holds as well, so condition (1) must be violated and a cannot be LCD-affected. Thus, \(d^L_G(r,a)\oplus b =d^L_\textsc {c}(r,a) \oplus b = d^L_\textsc {c}(r,b) < d^L_G(r,b)\), implying that (ab) is inserted.

(3) Assume \(d_\textsc {c}(r,b) = d_G(r,b)\) and \(d^L_{G'}(r,b) > d^L_G(r,b)\). If b is unstable we have \(d^L_\textsc {c}(r,b) = (d_\textsc {c}(r,b),\text { False}) = (d_G(r,b),\text { False})\) by Definition 15. Otherwise \(d_G(r,b) = d_\textsc {c}(r,b) = d_{G'}(r,b)\), so \(d^L_{G'}(r,b) > d^L_G(r,b)\) is only possible if \(d^L_{G'}(r,b)=(d_G(r,b),\text { False})\). Thus \(d^L_\textsc {c}(r,b) = d^L_{G'}(r,b) = (d_G(r,b),\text { False})\), so the last condition in case (b) of Lemma 13 holds in both cases.

Let p be a path from r to b in G of minimal landmark length, and a the predecessor of b in p. This implies \(d^L_G(r,a)\oplus b = d^L_G(r,b)\), so it only remains to show that (ab) is deleted. Assume to the contrary that (ab) lies in \(G'\), so condition (2) holds immediately. Since p has length \(d_\textsc {c}(r,b)\) we have \(d_\textsc {c}(r,b)=d_\textsc {c}(r,a)+1\). Together with earlier results this gives us \(d^L_\textsc {c}(r,b) = (d_\textsc {c}(r,b),\text { False}) = (d_\textsc {c}(r,a)+1,\text { False})\). Thus condition (3) holds if a is unstable. If a is stable, then \(d^L_\textsc {c}(r,b) \le d^L_\textsc {c}(r,a)\oplus b\). This is only possible if the landmark flag of \(d^L_\textsc {c}(r,a)\oplus b\) is \(\text { False}\), so condition (3) holds again. Finally \(d^L_G(r,a)\oplus b = d^L_G(r,b) < d^L_{G'}(r,b)\) so condition (4) holds as well. Thus, condition (1) must be violated, so a cannot be LCD-affected. It follows that \(d^L_{G'}(r,a)=d^L_G(r,a)\), and thus

$$\begin{aligned} d^L_{G'}(r,b) \le d^L_{G'}(r,a)\oplus b = d^L_G(r,a)\oplus b = d^L_G(r,b) \end{aligned}$$

violating \(d^L_{G'}(r,b) > d^L_G(r,b)\), a contradiction. \(\square \)

The next lemma shows that every LCD-child meets the pruning conditions in lines 20, 26 and 31 of Algorithm 6.

Lemma 14

Let w be an LCD-child of v. Then,

  1. (i)

    \(d_\textsc {c}(r,v) + 1 \le d_G(r,w)\)

  2. (ii)

    if v is stable, then \(d^L_\textsc {c}(r,v)\oplus w < d^L_G(r,w)\) or \(d^L_G(r,v)\oplus w = d^L_G(r,w) < d^L_\textsc {c}(r,v)\oplus w\)

  3. (iii)

    if v is unstable, then \(d_\textsc {c}(r,v) + 1 < d_G(r,w)\) or \(d^L_G(r,v)\oplus w = d^L_G(r,w)\)

Proof

Consider the cases in condition (4) of Definition 17. If \(d_\textsc {c}(r,v) + 1 < d_G(r,w)\) then conditions (1), (2) and (3) follow immediately.

If \(d^L_{G'}(r,v)\oplus w < d^L_G(r,w)\) then \(d_{G'}(r,v) + 1 \le d_G(r,w)\), and condition (1) follows from \(d_\textsc {c}(r,v)\le d_{G'}(r,v)\). If v is stable, then \(d^L_\textsc {c}(r,v)=d^L_{G'}(r,v)\) and condition (2) follows. If v is unstable, then \(d_\textsc {c}(r,v) < d_{G'}(r,v)\) and thus \(d_\textsc {c}(r,v) + 1 < d_{G'}(r,v) + 1 \le d_G(r,w)\), showing condition (3).

If \(d^L_G(r,v)\oplus w = d^L_G(r,w) < d^L_{G'}(r,w)\) then condition (3) follows immediately, and condition (1) from \(d_\textsc {c}(r,v) + 1 \le d_G(r,v) + 1 = d_G(r,w)\). For condition (2) stability of v gives us

$$\begin{aligned} d^L_G(r,v)\oplus w&= d^L_G(r,w) < d^L_{G'}(r,w) \\&\le d^L_{G'}(r,v)\oplus w = d^L_\textsc {c}(r,v)\oplus w \end{aligned}$$

showing condition (2). \(\square \)

We are now ready to prove our main result.

Theorem 1

Algorithm 6 returns the set of all LCD-affected vertices.

Proof

We prove it by showing the following:

  1. (a)

    For each \((d,l,s,v)\in {\mathcal {Q}}\), we have

    $$\begin{aligned} d^L_\textsc {c}(r,v)\le (d,l) \quad \text {and}\quad d\le d_G(r,v). \end{aligned}$$

    If \(l=\text { True}\), then \(s=\text { True}\) and \((d,l) < d^L_G(r,v)\).

  2. (b)

    From line 10 onwards, we have

    $$\begin{aligned} d^L_\textsc {c}(r,v)\le (d_v,l_v) \quad \text {and}\quad d_v\le d_G(r,v). \end{aligned}$$

    If \(l_v=\text { True}\), \(\text {stable}=\text { True}\) and \((d_v,l_v) < d^L_G(r,v)\).

  3. (c)

    From line 11, we have \(d_v=d_\textsc {c}(r,v)\).

  4. (d)

    In line 14 either \(d_w \ge d_\textsc {c}(r,w) \ge d_v\), or \((d_w,l_w)=\) \(d^L_\textsc {c}(r,w)\) and w is stable.

  5. (e)

    If line 18 is reached, then v is not LCD-affected.

  6. (f)

    If line 19 is reached, then v is LCD-affected.

  7. (g)

    From line 19, we have \((d_v,l_v,stable)=d^S_\textsc {c}(r,v)\).

  8. (h)

    The while loop has the following invariant: If v is LCD-affected with \(v\notin V_\textsc {aff+}\), then \({\mathcal {Q}}\cup \mathcal {Q^+}\) contains a tuple \((d,l,\ldots )\) with \((d,l)\le d^L_\textsc {c}(r,v)\).

Once shown, (h) guarantees that every LCD-affected vertex is included in \(V_\textsc {aff+}\), while (f) ensures that these are the only ones. By considering only the first time any condition would be violated, we may assume they all hold for earlier steps.

Case (a): For insertions prior to the while loop, the conditions clearly hold. Let (dlsw) be the first tuple inserted during the while loop that violates the condition, and consider the iteration during which this insertion occurred. By (g) we have \((d_v,l_v)=d^L_\textsc {c}(r,v)\), and the check in line 23 succeeds iff v is stable.

If v is stable, then insertion occurs in line 27. Thus,

$$\begin{aligned} (d,l)&= (d_v,l_v)\oplus w = d^L_\textsc {c}(r,v)\oplus w \\&= d_{G'}(r,v)\oplus w \ge d_{G'}(r,w) \ge d^L_\textsc {c}(r,w). \end{aligned}$$

Otherwise, insertion occurs in line 32, and we have

$$\begin{aligned} (d,l)&= (d_v + 1, \text { False}) = (d_\textsc {c}(r,v) + 1, \text { False}) \\&\ge (d_\textsc {c}(r,w), \text { False}) \ge d^L_\textsc {c}(r,w). \end{aligned}$$

This shows the first inequality. The check in line 20 ensures the second inequality \(d\le d_G(r,w)\).

Finally, let \(l=\text { True}\). Then, (dlsw) must have been inserted in line 27. Thus, we have \((d,l)=(d_v,l_v)\oplus w\), \(s=\text { True}\) and either \((d_v,l_v)\oplus w < d^L_G(r,w)\) or \(d^L_G(r,w) < (d_v,l_v)\oplus w\). In the first case \((d,l) < d^L_G(r,w)\) follows immediately. Otherwise, we have \(d^L_G(r,w) < (d,l)\) and \(d_G(r,w) \le d\), which together with the second inequality implies \(d = d_G(r,w)\). Thus, \(d^L_G(r,w) < (d,l)\) is only possible for \(l=\text { False}\), a contradiction.

Case (b): If \((d_v,l_v,\ldots )\) comes from \({\mathcal {Q}}\), the claim follows from (a). Note that variable stable will never be updated to \(\text { False}\). Otherwise, \((d_v,l_v,\text {stable},v,a)\) comes from \(\mathcal {Q^+}\). Then, stable is \(\text { True}\), (av) lies in \(G'\), and \((d_v,l_v) = d^L_G(r,a)\oplus v < d^L_G(r,v)\). Thus, \(d_v\le d_G(r,v)\) holds. Since the check in line 8 succeeded \(a\notin V_\textsc {aff+}\). By (h) and minimality of \(d_v > d_G(r,a) \ge d_\textsc {c}(r,a)\), it follows that a is not LCD-affected. Thus \(d^L_\textsc {c}(r,v)\le d^L_\textsc {c}(r,a)\oplus v = (d_v,lv)\).

Case (c): The check in line 10 must have succeeded, so \(v\notin V_\textsc {aff+}\). If v is LCD-affected, then \((d_v,l_v) \le d^L_\textsc {c}(r,v)\) by (h) and the claim follows from (b). If v is not LCD-affected, then \(d^L_\textsc {c}(r,v)=d^L_G(r,v)\) by Lemma 12 and the claim follows from (b).

Case (d): If \(w\in V_\textsc {aff+}\), then \((d_w,l_w) = LD[w] = d^L_\textsc {c}(r,w)\) by (g), and w is stable by (g); thus, (d) holds. Otherwise, \((d_w,l_w)=d^L_G(r,w)\); thus, \(d_w = d_G(r,w) \ge d_\textsc {c}(r,w)\). If \(d_\textsc {c}(r,w) \ge d_v\), then (d) holds. Let \(d_\textsc {c}(r,w) < d_v\). Since \(d_v\) was minimal in \({\mathcal {Q}}\cup \mathcal {Q^+}\) and \(w\notin V_\textsc {aff+}\), w cannot be LCD-affected by (h). By Lemma 12w is stable and \(d_\textsc {c}(r,w)=d^L_G(r,w)=(d_w,l_w)\), so (d) holds.

Case (e): Assume to the contrary that v is LCD-affected. Then, (h) implies \((d_v,l_v) \le d^L_\textsc {c}(r,v)\) as \((d_v,l_v)\) is minimal in \({\mathcal {Q}}\cup \mathcal {Q^+}\), and thus \((d_v,l_v)=d^L_\textsc {c}(r,v)\) by (b). Denote by w the neighbour of v for which line 18 is reached. The checks in lines 12 and 15 ensure that \(d_G(r,v) = d_w + 1\). Thus, w is stable and \((d_w,l_w)=d^L_\textsc {c}(r,w)\) by (d). This implies \(d_\textsc {c}(r,v) = d_\textsc {c}(r,w) + 1\), so v is stable. The check in line 17 ensures that

$$\begin{aligned} d^L_G(r,v) = d^L_\textsc {c}(r,w)\oplus v \ge d^L_\textsc {c}(r,v). \end{aligned}$$

Hence, either \(d^L_\textsc {c}(r,v) < d^L_G(r,v)\) or v is not LCD-affected by Lemma 12. Assume the former. Since \(d_G(r,v) = d_v = d_\textsc {c}(r,v)\), this requires \(d_\textsc {c}(r,v)=(d_v,\text { True})\). But that means \(l_v=\text { True}\) and the check in line 12 would have failed.

Case (f): Assume to the contrary that v is not LCD-affected. We will show that execution would have reached line 18 first.

If \(l_v=\text { True}\), we would have \(d^L_\textsc {c}(r,v)\le (d_v,l_v) < d^L_G(r,v)\) by (b) which contradicts Lemma 12, and thus \(l_v=\text { False}\). By (c) and Lemma 12, we have \(d_v=d_\textsc {c}(r,v)=d_G(r,v)\), and the check in line 12 succeeds.

By Lemma 12, v is stable. Hence, there must exist a stable path p from r to v of landmark length \(d^L_\textsc {c}(r,v)\). Let w be the predecessor of v in p so that w is stable and \(d^L_\textsc {c}(r,w)\oplus v = d^L_\textsc {c}(r,v)\). From (g) it follows that \(V_\textsc {unstable}\) contains precisely the unstable vertices in \(V_\textsc {aff+}\). Hence, w lies in \(N_{G'}(v)\setminus V_\textsc {unstable}\).

By (d) and \(d_\textsc {c}(r,w) < d_\textsc {c}(r,v) = d_v\), we have \((d_w,l_w)\) \(=d^L_\textsc {c}(r,w)\). It follows that \(d_v=d_\textsc {c}(r,v)=d_\textsc {c}(r,w)+1=d_w+1\) and \(d^L_G(r,v)=d^L_\textsc {c}(r,v)=d^L_\textsc {c}(r,w)\oplus v=(d_w,l_w)\oplus v\). Thus, w passes all checks required to reach line 18, and line 19 is not reached.

Case (g): By (f), v is LCD-affected. Hence, (h) implies \((d_v,l_v)\le d^L_\textsc {c}(r,v)\), and \((d_v,l_v)=d^L_\textsc {c}(r,v)\) by (b). It remains to show that variable stable is \(\text { True}\) iff v is stable.

If \(t_v\) came from \(\mathcal {Q^+}\) with anchor a, then stable is \(\text { True}\) and \(d_\textsc {c}(r,a) \le d_G(r,a) < d_G(r,a) + 1 = d_v\). Since the check in line 8 must have succeeded, we have \(a\notin V_\textsc {aff+}\). By (h) it follows that a is not LCD-affected. Since \(d_\textsc {c}(r,v) = d_v = d_G(r,a) + 1 = d_\textsc {c}(r,a) + 1\) we may conclude that v is stable.

Let \(t_v\) have come from \({\mathcal {Q}}\). If stable was initially \(\text { True}\), then the tuple must have been inserted in line 27 while examining a parent vertex u. As (g) holds for all earlier iterations, u must have been stable with \(d_\textsc {c}(r,u) + 1 = d_v = d_\textsc {c}(r,v)\), and v is stable. So let stable have been initially \(\text { False}\). \(l_v=\text { False}\) by (a). We distinguish two cases. (i) If \(d_v=d_G(r,v)\) then the check in line 12 succeeds. By Definition 15v is stable iff v has a stable neighbour w in \(G'\) with \(d_\textsc {c}(r,w)+1=d_\textsc {c}(r,v)\). By (d) line 16 is reached iff such a neighbour exists, so the claim holds. (ii) If \(d_v\ne d_G(r,v)\), then the check in line 12 fails. Thus, stable will still be false by the time when line 19 is reached. Assume to the contrary that v is stable. Then, it must have a stable neighbour a in \(G'\) with \(d_\textsc {c}(r,a)+1 = d_\textsc {c}(r,v) = d_v < d_G(r,v)\).

If a is not LCD-affected, then (av) must have been inserted. Hence, a tuple \(t_a=(d_\textsc {c}(r,v),l,\text { True},v,a)\) must have been inserted into \(\mathcal {Q^+}\). Minimality of \(t_v\) in \({\mathcal {Q}}\cup \mathcal {Q^+}\) means that \(t_a\) must have been removed from \(\mathcal {Q^+}\) earlier. At this point the checks in lines 8 and 12 would have failed, causing v to be added to \(V_\textsc {aff+}\), a contradiction.

If a is LCD-affected, then \(a\in V_\textsc {aff+}\) by (h) and minimality of \(d_v\) in \({\mathcal {Q}}\cup \mathcal {Q^+}\). Since a is stable, a tuple \(t_a=(d_\textsc {c}(r,v),l,\text { True},v)\) must have been inserted in \({\mathcal {Q}}\) in line 27 when processing a. Minimality of \((d_v,\text { False},\text { False})\) in \({\mathcal {Q}}\cup \mathcal {Q^+}\) means that \(t_a\) must have been removed from \({\mathcal {Q}}\) earlier. At this point the check in line 12 would have failed, causing v to be added to \(V_\textsc {aff+}\), a contradiction.

Case (h): We will show a stronger invariant, namely that if v is LCD-affected with \(v\notin V_\textsc {aff+}\) then either

  • (\({\mathcal {Q}}\)) \({\mathcal {Q}}\) contains a tuple (dlsb) with \((d,l)\le d^L_\textsc {c}(r,v)\) where \(b\notin V_\textsc {aff+}\) is an LCD-ancestor of v, or

  • (\(\mathcal {Q^+}\)) \(\mathcal {Q^+}\) contains a tuple (dlsba) with \((d,l)\le d^L_\textsc {c}(r,v)\) where \(b\notin V_\textsc {aff+}\) is an LCD-ancestor of v and a is not LCD-affected.

For every LCD-affected vertex v, there exists an LCD-ancestor of v that has no LCD-parent, possibly v itself. Thus InitQueues will add a tuple satisfying either (\({\mathcal {Q}}\)) or (\(\mathcal {Q^+}\)) by Lemma 13, so the invariant holds initially.

Consider now an iteration of the while loop where v is LCD-affected with \(v\notin V_\textsc {aff+}\). We need to show that the invariant is maintained for its LCD-descendants. Denote by \(t_v\) the tuple removed from \({\mathcal {Q}}\cup \mathcal {Q^+}\). If \(t_v\in \mathcal {Q^+}\), then by (f) \(a\notin V_\textsc {aff+}\), and the check in line 8 fails. Thus, execution reaches line 10 in both cases, and the check here passes by assumption. By (e) line 18 is never reached, and execution reaches line 22 where v is added to \(V_\textsc {aff+}\). Hence, the invariant is maintained for v.

Finally, let w be an LCD-child of v. By Lemma 14 and (g), the checks in lines 20 and 26 (if v is stable) or 31 (otherwise) succeed. Thus, a tuple \((d_w,l_w,s,w)\) is added to \({\mathcal {Q}}\) in line 27 or 32. By condition (3) of Definition 17, we have \((d_w,l_w)=d^L_\textsc {c}(r,w)\), and the invariant is maintained for w and its LCD-descendants. \(\square \)

8.2 Proof of minimality

Theorem 2

The highway labelling \(\Gamma '\) that BatchHL\(^+\) returns is the minimal highway labelling for \(G'\).

Proof

By Lemma 6 and Theorem 1, the vertex set \(V_\textsc {aff+}\) returned by Algorithm 6 contains all LD-affected vertices. By Lemma 6, for vertices outside of \(V_\textsc {aff+}\) the landmark distance to \(r_i\) does not change. In line 3 of Algorithm 7 the value of \(d^L_\textsc {bou}(v,V_\textsc {aff+})\) can be computed from \(\Gamma \). From Lemma 9, it follows that \(D_\textsc {bou}[v]=d^L_{G'}(r_i,v)\) whenever vertex v lies in \(V_{\min }\).

For each landmark r and each LD-affected vertex v w.r.t. r, we update the r-label of v in \(\Gamma '\) based on its landmark distance to r in \(G'\). By Lemma 5, these updates are correct. As the r-labels of vertices outside of \(V_\textsc {aff+}\) do not change, and we initialized \(\Gamma '\) using \(\Gamma \), this leave all vertices with correct r-labels, for all \(r\in R\), so the distance labelling of \(\Gamma '\) is correct and minimal. Highway is updated for vertices in \(V_\textsc {aff+}\) for all \(r\in R\), and do not change for others by Definition 9. \(\square \)

8.3 Complexity analysis

We analyse the time and space complexity of BatchHL\(^+\). Let a be the total number of affected vertices, l be the maximum label size, and d be the maximum degree. We perform |R| BFSs to construct highway labelling in \(O(|R| \cdot |V|)\) time and space. The time complexity of Algorithm 6 is \(O(a \cdot d \cdot l)\), where it visits O(a) vertices and for each affected vertex performs d queries to check its affected neighbours in \(O(d \cdot l)\) time. Note that a in Algorithm 6 refers to the total number of LCD-affected vertices which is much smaller than the total number of CP-affected vertices being referred to as a by Algorithm 2 of BatchHL. In practice, l and d are closer to the average values, and a is usually orders of magnitudes smaller than the total number of vertices in a graph. Further, in contrast to Algorithm 3 of BatchHL which repairs CP-affected vertices returned by Algorithm 2, Algorithm 7 of BatchHL\(^+\) repairs LCD-affected vertices returned by Algorithm 6. In the worse case, both Algorithm 3 and Algorithm 7 could repair the labels of all affected vertices. When repairing the label of an affected vertex, we check its neighbours in O(d). Thus, the time complexity of Algorithm 7 is \((a \cdot d)\), and the overall time complexity of BatchHL\(^+\) is \(O(|R| \cdot a \cdot d \cdot l)\) using O(V) space. We omit l from the time complexity of Algorithms 3 and 7 because we store distances for all unaffected neighbours of affected vertices in Algorithms 2 and 6.

9 Implementation optimizations

When implementing Algorithm 6, landmark length and stable landmark length values can be encoded as single integers to save storage space and speed up computation. Further, storing computed landmark distances in G for later reuse can also speed up processing.

For undirected graphs, the for loop in line 19 can be integrated into the for loop in line 13 if the latter is reached. For directed graphs, it can be eliminated altogether, in which case the for loops in lines 25 and 30 iterate over the outgoing neighbours.

Table 3 Datasets, where size(G) denotes the size of a graph G with each edge appearing in the forward and reverse adjacency lists and being represented by 8 bytes

Case (g) in Theorem 1 ensures that \(\texttt {LD}[v]=d^L_{G'}(r,v)\) for all stable LCD-affected vertices. Hence the highway labels for these can be updated directly, and batch repair can be restricted to unstable vertices. Eliminating duplicate values in \({\mathcal {Q}}\) or checking whether a value has already been processed in line 10 stops vertices from getting processed multiple times (whereas checking containment in \(V_\textsc {aff+}\) only prevents this for LCD-affected vertices). Case (e) shows that this will not cause any LCD-affected vertices to be missed. However, the overhead of tracking visited vertices may not be worth it in situations where repeated visits are rare.

10 Experiments

We have implemented our proposed method BatchHL\(^+\) to answer the following questions:

  1. (1)

    How efficiently can BatchHL\(^+\) perform in comparison with the state-of-the-art dynamic algorithms for answering distance queries?

  2. (2)

    How does the number of landmarks affect the performance of BatchHL\(^+\)?

  3. (3)

    How does the number of affected vertices effect the performance of BatchHL\(^+\) in comparison with the state-of-the-art dynamic algorithms?

  4. (4)

    What is the effect of the size of batch updates on the performance of BatchHL\(^+\)?

Experimental setup. In our experiments, all algorithms are implemented in C++11 and compiled using g++ 5.5.0 with the -O3 option. All the experiments are performed using a single thread on a Linux server (Intel Xeon W-2175 (2.50GHz CPU) with 28 cores and 512GB of main memory).

Baseline methods. We compare our proposed method BatchHL\(^{+}\) with the following state-of-the-art methods.

  • BatchHL [18]: A batch-dynamic method which performs graph changes in the batch-update setting to efficiently maintain a highway cover distance labelling. Then, it combines the updated labelling with graph traversal algorithm to answer distance queries. We compare our proposed method with the optimised version of BatchHL in our experiments.

  • FulHL [20]: A fully dynamic method which presents two algorithms IncHL and DecHL to maintain a highway cover distance labelling as a result of graph changes in the single-update setting. Then, it combines the updated labelling with graph traversal algorithm to answer distance queries.

  • FulFD [23]: A fully dynamic method that incorporates two algorithms IncFD and DecFD to maintain a distance labelling in the single-update setting as a result of graph changes. Then, it combines the updated distance labelling with a graph traversal algorithm to answer distance queries.

  • PSL* [27]: A parallel algorithm which constructs pruned landmark labelling for static graphs to answer distance queries.

  • BiBFS [23]: An online bidirectional BFS algorithm which answers distance queries using an optimized strategy to expand searches from the direction with fewer vertices.

The code for FulFD, FulPLL and PSL* was provided by their authors and implemented in C++. We use the same parameter settings as suggested by their authors unless otherwise stated. For a fair comparison, we also select high degree landmarks and set them to 20 in the same way as FulFD for our methods. We set the number of threads to 20 for PSL*.

Table 4 Comparing update time of our methods BatchHL\(^{+}\) and BHL\(^D\) with the state-of-the-art dynamic methods
Fig. 4
figure 6

Distance distribution of batch updates

Datasets. We use 15 large real-world networks from a variety of domains to verify the efficiency, scalability and robustness of our algorithm. Among them, Italianwiki and Frenchwiki are two real dynamic networks whose topology evolves over time. We treat these networks as undirected and unweighted graphs, and their statistics are summarized in Table 3. These datasets are accessible at Stanford Network Analysis Project [26], Laboratory for Web Algorithmics [9], and the Koblenz Network Collection [25].

Test data generation. For batch-dynamic algorithms, we generate 10 batches of updates for all the 13 datasets, where each batch update contains 1000 edges that are randomly selected from E. We consider three batch update settings for testing:

  1. (1)

    Decremental—delete these batches of updates and measure the average deletion time;

  2. (2)

    Incremental—add these batches of updates after decremental updates and measure the average insertion time;

  3. (3)

    Fully dynamic—randomly select 50% updates from each of 10 batches of decremental updates as insertions and the remaining 50% as deletion. We report the average update time.

For the two datasets that are real-world dynamic networks, we select 10 batches in the order of their timestamps, each containing 1000 real-world inserted/deleted edges and measure the average update time after applying them in a streaming fashion.

For the methods FulHL and FulFD, we randomly sample 1000 edges and follow the same update settings as above to measure the update time of performing updates one by one. These settings enable us to explore the impacts of edge insertions and edge deletions, respectively, in addition to their combined effect.

Figure 4 shows the distance distribution of edges in these batches after deleting. The distances in all datasets are small ranging from 1 to 6. This shows that the updates are mostly from densely connected components of these networks which may cause fewer vertices to be affected in the incremental setting. Only a small number of updates are disconnected in most of these datasets.

For queries, we randomly sample 100,000 pairs of vertices in each dataset to evaluate the average querying time on graphs being changed as a result of fully dynamic batch updates. We also report the average labelling size produced by our batch-dynamic method \(\textsc {BHL}^{++}\) after performing fully dynamic batch updates.

11 Experimental results

11.1 Performance comparison

We compare our methods against the baseline methods in terms of update time, labelling size and query time.

11.1.1 Update time

Table 4 presents the average update time in the fully dynamic, incremental, and decremental settings of our proposed method and the baseline methods.

Fully dynamic setting. Table 4 shows that BatchHL\(^{+}\) significantly outperforms BatchHL. We observe that BatchHL\(^{+}\) is up to 3 orders of magnitude faster than BatchHL on large datasets. BatchHL\(^{+}\) also significantly outperforms FulHL and FulFD on all datasets. In particular, BatchHL\(^{+}\) is over 15 times faster than FulHL on most of the datasets and several orders of magnitude faster than FulFD. Further, the performance difference of BatchHL\(^{+}\) and BatchHL is due to the fact that our improved batch search in BatchHL\(^{+}\) further prunes affected vertices that do not need a repair in their labels, and in practice they are significant in amount as can be seen in Table 6. Our method BatchHL\(^{+}\) is also much faster than BatchHL, FuLHL and FuLFD on the real-world dynamic networks: Italianwiki and Frenchwiki. We can observe that the average update time of BatchHL\(^{+}\) is always by far smaller than recomputing labelling from scratch, i.e. construction time of BatchHL\(^{+}\) in Table 5. Further, the construction time of BatchHL\(^{+}\) is significantly smaller than the construction of the baseline methods on all datasets. We can also see that the parallel variant of PLL (PSL*) still failed to construct labelling for the largest three datasets.

Table 5 Comparing performance of our method BatchHL\(^{+}\) with the baseline methods FulFD, FulPLL and PSL\(^*\) in terms of construction time (CT), query time (QT) and labelling size (LS)

Incremental setting. Table 4 also shows that BatchHL\(^{+}\) is comparable with BatchHL because of no technical difference in the batch search approach of both methods. On the other hand, it significantly outperforms IncHL and IncFD. This is because IncHL and IncFD involve in repeated and unnecessary computations while performing updates and require extra usage of resource for each single update.

Decremental setting. In Table 4, we can also see that BatchHL\(^{+}\) is much faster than BatchHL in this setting. This is because it further categorises and prunes out affected vertices whose labels do not need to be changed. In contrary, BatchHL failed to recognise such vertices. The difference in the amount of affected vertices repaired by BatchHL\(^{+}\) and BatchHL is shown in Table 6. It is clear that BatchHL\(^{+}\) has to repair much smaller fraction of affected vertices in the decremental setting. Furthermore, BatchHL\(^{+}\) significantly outperforms DecHL and DecFD on all the datasets. Especially, BatchHL\(^{+}\) can achieve outstanding performance on networks which have a high average degree such as Twitter, Flickr and Hollywood. Due to inherent complexity of edge deletion on graphs (i.e. increasing distances), DecHL and DecFD take very long in identifying and updating labels of affected vertices. BatchHL\(^{+}\) outperforms these methods because it leverages the benefit of handling updates in a batch and significantly reduce repeated computations during identifying and repairing the labels of affected vertices.

11.1.2 Labelling size

Table 5 shows that BatchHL\(^{+}\) yields significantly smaller labelling sizes than FulFD and PSL* on all the datasets. Note that BatchHL\(^{+}\), BatchHL and FulHL produce labelling of equal sizes after performing update operations because they are designed using highway cover property. When an update occurs, the labelling size of FulFD remains unchanged because they store complete shortest-path trees at all times. In contrast, BatchHL\(^{+}\) similar to state-of-the-art BatchHL and FulHL stores pruned shortest-path trees based on highway cover property. Nonetheless, the labelling size of BatchHL\(^{+}\) remains stable in practice because the average label size is bounded by a constant, i.e. the number of landmarks. The parallel variant of PLL (PSL*) which exploit PLL properties to reduce labelling size still produces labelling of very large size as compared to BatchHL\(^{+}\).

Table 6 Average number of vertices affected by BatchHL and BatchHL\(^{+}\) after performing batch updates on all the datasets

11.1.3 Query time

Table 5 shows that the average query time of BatchHL\(^{+}\) is comparable with FulFD. It was reported [12] that the average query time is largely dependent on labelling size. Since the dynamic operations do not considerably affect the labelling size for BatchHL\(^{+}\) and FulFD, their query times remain stable. On Twitter, the query time of BatchHL\(^{+}\) underperforms FulFD. This is because FulFD also maintains the shortest-path information for the neighbourhood of landmarks and the maximum degree of Twitter is very high which might cause many pairs to be covered by landmarks.

Although the query time of PSL* in Table 5 is better than BatchHL\(^{+}\) on some datasets, it only handles static graphs. For dynamic graphs, it has several limitations including: (1) the cost of re-constructing labelling from scratch after each batch update is too high to afford, particularly when batch updates are frequent or when underlying dynamic graph is large which is evident from Table 5, (2) the labelling size is much larger than BatchHL\(^{+}\). As we can see in Table 5, PSL* produces the labelling of size almost 99% larger than the labelling of BatchHL\(^{+}\) for Orkut thus possess a high query cost as well. Considering the overall performance w.r.t. three main factors, i.e. query time, labelling size and construction time, BatchHL\(^{+}\) stands out in claiming the best trade-off between these factors.

11.2 Performance under varying landmarks

Figure 8 and Fig. 9 show how the update time of our method BatchHL\(^{+}\) in the fully dynamic setting behaves when increasing the number of landmarks. We can see that the update time in Fig. 8 for almost all datasets grows to 30 landmarks and then either decreases or grows linearly. This is because selecting a larger number of landmarks can better leverage the pruning power of our method. Figure 9 shows that the query time decreases or remains the same for almost all datasets with the increased number of landmarks. Particularly, the query time of Twitter decreases because of a very high average degree and selecting a larger number of high degree landmarks contributes greatly towards shortest-path coverage and makes querying process faster.

11.3 Performance under varying batch sizes

We compare the total time of querying and updating on dynamic graphs. We randomly select 10,000 updates from E and process them in batches of varying size. For each batch, we first delete 50% updates and then perform a batch in the fully dynamic setting. To make a fair comparison, the total time of our method BatchHL\(^{+}\), and the baseline methods BatchHL, FulHL and FulFD is the total time to perform a batch update plus the query time to perform 1000 queries after performing a batch update and then averaged over 1000 queries, denoted as BatchHL\(^{+}\)+QT, BatchHL+QT, FulHL and FulFD+QT, respectively.

Figure 5 presents the results. For the baseline method BiBFS, we take only the query time averaged over 1000 queries after applying a batch update. The overall performance of BatchHL\(^{+}\)+QT is significantly better than the baseline methods on all datasets. It is worth noticing that BatchHL\(^{+}\)+QT is not only more efficient than BatchHL, but also their efficiency gap remain significant with the increased size of batch updates on majority of datasets. This shows that the novel batch search strategy used in BatchHL\(^{+}\)+QT is effective in reducing the size of affected vertex set. We also observe that the update time along with the query time of our methods grows fast for batches of smaller sizes (with up to 1000 updates) and then grows very slowly when batch sizes become very large which shows that our methods are robust w.r.t the increased batch size.

Fig. 5
figure 7

Total time of querying and updating by the proposed and baseline methods under varying batches of mixed updates ranging from 500 to 10,000

Fig. 6
figure 8

Update time of the proposed and baseline methods under varying batches of insertions ranging from 500 to 10,000

Fig. 7
figure 9

Update time of the proposed and baseline methods under varying batches of deletions ranging from 500 to 10,000

11.4 Scalability test

We analyse the update time performance of our method BatchHL\(^{+}\) under varying batches of insertions and deletions separately. We randomly select 10,000 updates from E and process them in batches of varying size. We start with a batch of 500 updates and then iteratively increase the batch by 500 updates to 10,000 updates. We process these batches in the incremental setting followed by the decremental setting. Figures 6 and 7 show the average update times after constructing the labelling from scratch, and updating the labelling after each batch increase. We observe from Fig. 6 that the update time of BatchHL\(^{+}\) for batches of insertions is always below the construction time of labelling and significantly faster than the baseline methods. As compared with BatchHL\(^{+}\), it does not show any performance difference because both algorithms repair roughly the same set of affected vertices (as can be seen in Table 6). Figure 7 shows that BatchHL\(^{+}\) performs well on all the datasets. The performance gap of the proposed method BatchHL\(^{+}\) compared to state-of-the-art methods BatchHL in the batch-update setting as well as DecHL and DecFD in the single-update setting is huge which can be verified by the amount of repairs each algorithm has to perform in Table 6. Overall, the update time performance is dependent on the fraction of affected vertices to be repaired and our method can scale to perform large batches of updates efficiently.

Table 7 Comparing update time, construction time (CT), query time (QT) and labelling size (LS) on directed graphs
Table 8 Average number of affected vertices after performing batch updates on directed graphs

11.5 Performance on directed graphs

Table 7 shows our experimental results on directed graphs in the fully dynamic setting (as described under “Test data generation”). The update time of our proposed method BatchHL\(^+\) significantly outperforms the state-of-the-art method BatchHL. The difference in the amount of affected vertices repaired by BatchHL\(^{+}\) and BatchHL is shown in Table 8. It is clear that BatchHL\(^{+}\) has to repair a much smaller fraction of affected vertices. Note that the performance of BatchHL\(^+\) and BatchHL in terms of CT, QT and LS is the same because both of them dynamically maintain the highway labelling for fast distance querying. We also see that BatchHL\(^+\) is several orders of magnitude faster in updating labels w.r.t. 1000 updates than constructing labelling from scratch for most of the datasets (Figs. 8, 9).

12 Extension to weighted graphs

For non-negative weighted graphs, we use the Dijkstra’s algorithm in place of BFSs in our batch search and batch repair algorithms in order to compute and maintain a highway cover distance labelling for fast querying. We consider graph updates in the form of edge weight increase or decrease instead of edge insertion or deletion. Our methods can then handle weight increases in a similar way to edge deletions and weight decreases in a similar way to edge insertions.

Fig. 8
figure 10

Update time of the proposed method BatchHL\(^{+}\) under varying landmarks ranging from 10 to 50

Fig. 9
figure 11

Query time of the proposed method BatchHL\(^{+}\) under varying landmarks ranging from 10 to 50

13 Conclusion

In this article, we have proposed an efficient method to answer distance queries on dynamic graphs undergoing batch updates. We have explored combined and separate effects of different types of updates in a batch to further reduce search space for maintaining distance labelling against graph changes. We have analysed the correctness and complexity of our method and showed that it preserves the labelling minimality. We have empirically verified the efficiency and scalability of our method on 15 real-world networks. For future work, we plan to explore the applicability and extension of the proposed method to road networks.