Edinburgh Research Explorer On the exponential generating function for non-backtracking walks

We derive an explicit formula for the exponential generating function associated with non-backtracking walks around a graph. We study both undirected and directed graphs. Our results allow us to derive computable expressions for non-backtracking versions of network centrality measures based on the matrix exponential. We ﬁnd that eliminating backtracking walks in this context does not signiﬁcantly increase the computational expense. We show how the new measures may be interpreted in terms of standard exponential centrality computation on a certain multilayer network. Insights from this block matrix interpretation also allow us to characterize centrality measures arising from general matrix functions. Rigorous analysis on the star graph illustrates the eﬀect of non-backtracking and shows that localization can be eliminated when we restrict to non-backtracking walks. We also investigate the localization issue on synthetic networks.

We derive an explicit formula for the exponential generating function associated with non-backtracking walks around a graph. We study both undirected and directed graphs. Our results allow us to derive computable expressions for nonbacktracking versions of network centrality measures based on the matrix exponential. We find that eliminating backtracking walks in this context does not significantly increase the computational expense. We show how the new measures may be interpreted in terms of standard exponential centrality computation on a certain multilayer network. Insights from this block matrix interpretation also allow us to characterize centrality measures arising from general matrix functions. Rigorous analysis on the star graph illustrates the effect of non-backtracking and shows that localization can be eliminated when we restrict to non-backtracking walks. We also investigate the localization issue on synthetic networks.

Motivation
The concept of a walk on a graph is very natural-on arriving at a node, the walker may continue by traversing any edge pointing out of that node. If, at any stage, the edge along which the walker continues is the reverse of the edge on which they arrived, then the walk is said to be backtracking. Non-backtracking walks have been analyzed in a number of fields. They play a key role in the study of zeta functions on graphs [24], with applications in spectral graph theory [4,23], number theory [44], discrete mathematics [12,41], quantum chaos [39], random matrix theory [40], stochastic analysis [3], applied linear algebra [42] and computer science [38,45]. Within network science, constraining walks to be non-backtracking has been shown to offer benefits in community detection [26,28], centrality measurement [6,20,29,36], network comparison [30] and in the study of related issues concerning optimal percolation [31,32].
A natural task across all these fields is to count the number of distinct nonbacktracking walks of a given length between pairs of nodes, and to form a compact expression for the associated generating function. For network centrality, in the case of standard walk counts it has been argued that an exponential-style power series gives a useful alternative to the standard resolvent-style version [15]. For example, based on an analogy with a physical oscillator, it may be argued that moving from the resolvent to the exponential takes us from classical to quantum physics [16]. Further, there are effective and reliable tools for computing the action of the matrix exponential [2,21,22]. This provides the initial motivation for our article, where we study the exponential generating function associated with non-backtracking walks. We also note that more general matrix functions have been proposed in this walk-counting context [17]. Hence, we then extend the analysis to cover generating functions based on arbitrary power series.

Background
We let A = (a ij ) ∈ R n×n denote the adjacency matrix for an unweighted graph with n nodes; that is, a ij = 1 if there is an edge from node i to node j, and a ij = 0 otherwise. We will assume the graph to be loopless, so that a ii = 0 for all i, and to have no multiple edges. If we further assume the graph represented by A to be undirected, i.e., if A = A T , then the graph will be said to be simple. We use D to denote the diagonal matrix whose diagonal entries are D ii = (A 2 ) ii . So D ii counts the number of reciprocal neighbours of node i, that is, the number of nodes j such that a ij = a ji = 1. If A = A T , so that all edges are reciprocated, then D ii reduces to the vector of degrees and any node i for which D ii = 1 will be called a leaf.
If A = A T the network is said to be directed and we will denote by S = (s ij ) ∈ R n×n the matrix associated with the graph obtained by removing all edges that are not reciprocated, so that s ij = a ij a ji for all i, j = 1, 2, . . . , n.
A walk of length r from node i to node j is characterized by a sequence of edges Note that edges and nodes may be revisited in the course of a walk. It follows from the definition of matrix multiplication that A r has (i, j)th element that counts the number of distinct walks of length r from i to j; see, for example, [13,Theorem 2.2.1]. Given a parameter t, the ordinary generating function [46] associated with this sequence of walk counts may be represented by a resolvent function: where I is the identity matrix. If t is interpreted as a formal algebraic variable, this expression holds within the algebra of formal power series. Analytically, if t is interpreted as a real (or complex) valued parameter, it is valid for |t| < 1/ρ(A), where ρ(·) denotes the spectral radius. Similarly, the associated exponential generating function [46] takes the form of a matrix exponential: analytically valid for all t ∈ R (or in C).
Power series expansions that combine walk counts of different lengths have become widely used in the field of network science [17]. Here, the idea is to assign a weight to each node that summarizes its "importance" in the network. If we interpret importance as the ability of a node to initiate messages along the edges in the graph, then we see that (I − tA) −1 1, for 0 < t < 1/ρ(A), and exp(tA)1, for t > 0, are candidates for measuring and comparing nodes, where 1 ∈ R n is the vector of ones. Here, the ith component arises by summing the number of distinct walks from node i to all nodes in the graph, with walks of length r discounted by t r in the resolvent case and by t r /r! in the exponential case. The resolvent version dates back to the work of Katz [25] and the alternative based on the exponential function, called the total (node) communicability, was introduced and studied in [8]. We also note that both versions are related to eigenvector centrality [10,11], which uses the Perron-Frobenius eigenvector of A when the underlying graph is strongly connected [9].
In recent work [29], Martin et al. argued that traditional centrality measures may suffer from unwanted localization effects when the network under study has a scale-free degree distribution (see, e.g., [7,14]). In this type of network, there are a few nodes with a high number of neighbours, that are thus assigned most of the centrality, and many nodes with a very small number of connections, that hence share the remainder of the mass of the centrality vector. This unbalanced partitioning makes it difficult to distinguish between the importance of the nodes with few connections. The authors therefore proposed a variation of eigenvector centrality motivated by the concept of non-backtracking walks. This idea was pursued in [6,20] in the context of resolvent-based centrality. Our aim here is to develop and study the analogous exponential version of non-backtracking walk centrality, and also to consider general matrix functions.
Formally, we say that a walk is backtracking if it contains at least one pair of successive edges of the form i → j, j → i; that is, having left some node i it immediately returns to that node. We say the walk is non-backtracking otherwise. We use the abbreviation NBTW to mean non-backtracking walk. We will denote by p r (A) ∈ R n×n the matrix whose (i, j)th entry counts the number of NBTWs of length r from node i to node j in the graph represented by the matrix A. Our first task is therefore to obtain useful expressions for the corresponding exponential generating function and centrality measure b := F (t)1.
We do this in Sections 2 and 3 for the case of undirected and directed graphs, respectively. Although, of course, an undirected graph is a special case of a directed graph, we treat the two cases separately for two reasons. First, undirected graphs are common in network science, and hence it is useful to have results tailored to this case. Second, in our context it seems more straightforward to study undirected graphs directly than to simplify the more general expression for directed graphs. In Section 4, we briefly describe how to compute the proposed centrality measures and discuss computational costs. In Section 5, we interpret the new expressions from the viewpoint of traversals around multilayer networks with possibly negative interlayer weights. This block matrix framework also allows us to derive expressions for the case where the exponential is replaced by a general matrix function. In Section 6 we give a concrete example of the benefit of nonbacktracking, and further illustrations on realistic model networks appear in Section 7. We end with a short summary in Section 8.
The key analytical steps used here are to derive and solve an ODE-this is a widely used approach in the study of exponential generating functions [46]. To this end, we note from (1) that We also require a lemma on matrix-valued ODEs. The following result arises from columnwise application of a standard result on vector-valued linear systems; see, for example, [21, Section 2.1]. The statement of the result makes use of the shifted expo-nential function ψ 1 (z) = (e z − 1)/z, which may be defined formally via the power series expansion Then is the unique solution to the linear, constant coefficient, inhomogeneous, autonomous, initial value ODE system Moreover, in the homogeneous case P = 0, the expression (5) simplifies to Remark 1.2. We note that Lemma 1.1 does not require the Jacobian matrix C to be nonsingular.

Exponential generating function for undirected graphs
We assume throughout this section that the graph is undirected. We also assume for convenience that the graph is connected-in the disconnected case, each component may be studied separately. We note that each node is therefore assumed to have at least one neighbour.
As we mentioned in Section 1.2, the matrix A r counts walks of length r. Hence the trivial relationship A r+1 = AA r gives a recurrence between walk counts for length r and r + 1. In the case of NBTWs on undirected graphs, the following two term recurrence relates walk counts of successive lengths. We have p 0 (A) = I, p 1 This recursion was proved in [12] for general directed graphs. For the undirected case considered here, the recurrence was re-derived independently in [41]. We now give the main statement of this section: an explicit formula for the generating function (1) and the corresponding centrality measure (2). Theorem 2.1. For a simple graph with associated adjacency matrix A, the exponential generating function F (t) in (1) and total communicability vector b in (2) satisfy In the case where the graph has no leaves, these expressions may be written as Proof. To obtain the exponential generating function F (t) in (1), we multiply by t r /r! in (7) and then sum from r = 1 to ∞, to obtain From (3), this shows which simplifies to To convert to a first order system, we introduce so that (9) becomes where Y is defined in (8). It then follows from Lemma 1.1 that Finally, we note that the Jacobian matrix Y in (8) is invertible if and only if (D −I) −1 exists, that is, if the graph does not contain leaves, i.e., nodes of degree one. In this case and (10) simplifies to Noting that F (t) = I 0 G(t), we see that (10) and (11) yield the statement. 2 We discuss these results further in Sections 4 and 5.

Exponential generating function for directed graphs
For directed graphs the matrices p r (A) that count NBTWs satisfy a three term recurrence. This was shown in [12], with a more linear algebra oriented derivation given in [42]. With p 0 (A) = I, p 1 (A) = A, p 2 (A) = A 2 − D, the recurrence takes the form Recall that S ∈ R n×n is the adjacency matrix of the graph obtained by only considering reciprocated links in the original network; so s ij = a ij a ji . We emphasize that unlike the undirected case (7), this recurrence is valid from r = 0.
Here is the analogue for directed graphs of Theorem 2.1.
Theorem 3.1. Let A be the adjacency matrix of an unweighted directed graph with no self-loops nor multiple edges. The exponential generating function F (t) in (1) and total communicability vector b in (2) satisfy Proof. Multiplying (12) by t r /r! and summing from r = 0 to ∞ we arrive at We note that, in contrast to (9), this ODE does not have an inhomogeneous term. To convert to a first order system, introduce so that where Z is defined in (13).
Using (6) in Lemma 1.1 and noting that F (t) = I 0 0 H(t), we arrive at the statement. 2 It is of interest to note that the removal of backtracking walks may be interpreted as the introduction of damping terms −(D − I)F (t) and −(A − S)F (t) in (14). Further, Theorem 3.1 shows that, in general, as t increases the total communicability vector b aligns with the first n components of the dominant eigenvector of Z.

Computing the centrality vectors
In this section we consider computational issues. First, we note that the standard expression for the total (node) communicability, e tA 1, requires the action of the exponential of a sparse n by n matrix on a vector in R n . It is reasonable to assume that a suitable approximation may be obtained iteratively, using a small number of sparse matrix-vector multiplications [2]. Assuming that A has O(n) nonzeros (edges), then each sparse matrix-vector multiplication will cost O(n). Hence, using k to denote the number of iterations, the overall cost will be O(nk). In practice, it is reasonable to regard k as finite and independent from n, so that this expression becomes O(n).
In Theorem 2.1, we see that for an undirected graph with no leaves, the cost of computing the non-backtracking total communicability vector b is dominated by the action of e tY on a vector in R 2n , where the 2n by 2n sparse matrix Y is defined in (8). We note that Y has only 2n more nonzeros than the underlying adjacency matrix A, and hence, under the assumptions above, if the same number of iterations is required, then the same O(n) cost applies.
In the directed case, the expression for b in Theorem 3.1 requires the action of e tZ on a vector in R 3n , where the 3n by 3n sparse matrix Z is defined in (13). Since Z also has O(n) nonzeros, we again recover the O(n) cost under our assumptions.
It remains to consider the case of an undirected graph with leaves. In Theorem 2.1, the expression for b involves the shifted exponential, ψ 1 , from (4). We may proceed by reducing the key computation to the approximation of the action of a matrix exponential in a slightly higher dimension. Let where v ∈ R 2n and 0 ∈ R 2n is a vector of all zeros. It was shown in [37, Section 2.3] that then b in Theorem 2.1 may be computed by first approximating the vector exp( Y )e 2n+1 , where e 2n+1 is the (2n + 1)th vector of the standard basis of R 2n+1 . We then extract the top n entries of the resulting vector, scale by t and shift by 1. This again will have an O(n) cost under our assumptions.
Remark 4.1. It is worth emphasizing that the final steps of scaling and shifting can be avoided if we are only interested, as is customary in network science, in the rankings of the nodes rather than their actual centrality scores.

Block matrix interpretations
Theorems 2.1 and 3.1 show that, loosely, for exponential-based centrality the operation of eliminating backtracking walks is equivalent to replacing the original adjacency matrix A with a suitable block matrix. In the undirected case, the 2 by 2 version Y in (8) is used, whereas the directed case has the 3 by 3 version Z in (13). These block matrices were constructed indirectly, as a consequence of the generating function approach, [46].
In this section, we add insight by showing how these block matrices may be interpreted directly. We also show that they remain relevant when the exponential is replaced by a general matrix function.
We begin with an algebraic interpretation. The matrices Z and Y are the first companion matrix linearization [18] of the matrix polynomials (A −S) +(D −I)λ −Aλ 2 +Iλ 3 and, respectively, (D − I) − Aλ + Iλ 2 . These are in turn the reversals of, respectively, the directed and undirected version of the deformed graph Laplacian, a matrix polynomial that plays a crucial role in the theory of NBTW combinatorics, see [6,20]. Algebraically, the companion matrix is the multiplication-by-λI operator in the module obtained by forming quotients in the ring of matrix polynomials using the left ideal generated by the linearized matrix polynomial [34]. Linear-algebraic functions of the companion matrix also have an analogous interpretation; for example, exp(tZ) represents multiplication by exp(tλ)I modulo (A − S) + (D − I)λ − Aλ 2 + Iλ 3 . For more details on this algebraic view of companion matrices see, e.g., [33,Sec. 2], [34,Sec. 5], [35,Sec. 9], and the references therein.
There is also a graph-theoretical interpretation that we now describe in detail. We focus first on the directed case, involving Z in (13). A useful connection is that Z represents the adjacency matrix for a three layer weighted network. This network is node-aligned [27], so that the general ith nodes from each layer may be regarded as copies of the same node. Looking at the diagonal blocks of Z, we see that the within-layer connections correspond to the empty graph for layers one and two, and the original graph for layer three. Next, we consider the off-diagonal blocks. The identity matrix in the (1, 2) block and zero matrix in the (1, 3) block show that the only edge out of a node in layer one points to the corresponding node in layer two. Similarly, we may only leave layer two by moving to the corresponding node in layer three. Within layer three, in addition to moving within the layer using edges from A, traversals are possible to layers two and one. The (3,2) block shows that for D ii > 1 a traversal may move from node i in layer three to node i in layer two, with the negative weight 1 − D ii assigned to that edge. The (3, 1) block shows that a move from node i in layer three to node j in layer one is allowed if and only if the original graph has a nonreciprocal edge from node i to node j, in which case the inter-layer edge carries weight −1.
The expression for b in Theorem 3.1 shows that exponential NBTW centrality arises from a post-processed version of the "standard" exponential centrality on this multilayer network, where negative inter-layer weights have the effect of negating the contributions of backtracking walks in the underlying graph.
The two lemmas and the theorem below show that Z is also relevant for more general matrix functions.
Proof. For r < 2, the identity may be verified directly. For r ≥ 2 the result follows by straightforward induction using the three term recurrence (12). 2 Proof. By Lemma 5.1 it follows that For the second part, note first that ∞ r=0 α r+2 Z r also converges under the assumptions in the statement. Then Theorem 5.2 shows that we can accumulate combinations of NBTW counts in the original network by looking at the (3, 3) block of an appropriate function of the matrix Z. From a multilayer perspective, the (3, 3) block of Z is associated with the layer containing the adjacency matrix A of the original network. Theorem 5.2 therefore concerns walks of any length that start and end at that layer. This includes not only walks that take place within layer three, but also those that make one or more intermediate visits to the other layers.
The contribution coming from f 0 (Z) counts walks in the multilayer network using the original weights. The correction term involving f 2 (Z) removes walks from the count by weighting walks of length r − 2 as if they were of length r.
In more detail, the term f 2 (Z) is used to correct the appearance of an identity matrix in Z 2 that is then propagated at higher powers of Z. This identity matrix at level r = 2 originates from the one appearing in the (3,2) block. This latter is used in the recursion (12) defining p r (A) when r > 2 to correct the penalization of walks of the type which are backtracking at length (r − 1) 1 and thus need not be removed by the factor p r−2 (A)D. Walks of this type take place entirely in the third layer and are balanced by the existence of walks of the form node: i → · · · → k → j → j → j layer: provided that r > 2. When r = 2, walks of the form node: j are improperly weighted, as the first hop contains a +1 that is used to balance the removal of walks that have never even existed. This introduces a "delay" that is then propagated when longer walks are counted. To counteract it, the matrix function f 2 (Z) removes walks of length r − 2 that are however weighted as if they were of length r; this type of weighting precisely targets those walks obtained from the propagation of the identity matrix introduced at r = 2 in f 0 (Z). Theorem 5.2 can be used to recover the known characterisation of the ordinary generating function. To see this, we let α r = t r for all r, with |t| < ρ(Z) −1 . Then we have Similarly, setting α r = t r /r! for some t > 0, Theorem 5.2 gives where ψ 2 (x) = ∞ r=0 x r /(r + 2)!. This description of F (t) from (1) is equivalent to that given in Theorem 3.1. However, (15) casts the result entirely in terms of the (3, 3) block, which, as we have argued above, has a natural multilayer interpretation.
We finish this section by briefly discussing the undirected case. Here, the block matrix Y in (8) may be associated with a two-layer network. In layer one, the only possible transition is to the equivalent node in layer two. We may move within layer two using any edge that exists in the original graph, or, for D ii > 1, we may move from node i in layer two to node i in layer one along an edge with negative weight 1 − D ii . The analogues of Lemma 5.1 and Theorem 5.2 are given below, with proofs following in a similar manner.
Theorem 5.4 (Undirected graph). Let {α r } ∞ r=0 be a sequence of nonnegative weights such Theorem 5.4 quantifies in the undirected case how NBTWs on the original network may also be counted by considering walks on the associated multilayer network that start and end in the layer containing A.

Star graph analysis
In this section, we give further insights into the effect of restricting to NBTWs in a total communicability centrality measure. For eigenvector centrality, the authors in [29] argued that non-backtracking is a means to avoid localization; that is, the concentration of weight on a small number of nodes. Following [19], they characterized localization through an asymptotic limit. Given v = (v i ) ∈ R n the inverse participation ratio is defined as This quantity is of order one in the case where a small number of elements in v are responsible for most of the weight. More formally, we say that the measure is localized (1), in the asymptotic limit n → ∞.
We will analyze this property on the undirected star graph with n nodes. Here the adjacency matrix takes the form where 1 s ∈ R s is the vector of all ones. Thus node 1 is a hub with n − 1 connections to nodes 2, 3, . . . , n, which are only connected to node 1. Intuitively, the effect of backtracking should be significant in this example. The hub node has n − 1 neighbours, and hence n − 1 walks of length 1, but cannot initiate walks of length two or more without backtracking. Each peripheral node, in contrast, has a single neighbour, but n − 2 NBTWs of length two. Hence, in the non-backtracking regime, if t is taken sufficiently large that degree does not dominate, then having one high quality link (to a hub) may be comparable in importance to having many low quality links (to leaves). It is straightforward to show for (17) that (see, for example, [5,21]) Hence, the standard, backtracking walk version of total communicability, x = e tA 1, satisfies So the values x 1 for the hub node and x 2 for a typical leaf node satisfy It follows immediately that for any t > 0 we have x 1 > x 2 when the graph has at least three nodes. Hence, the standard total communicability measure always ranks the hub node above the leaves. We also find that, for fixed t > 0, and hence the measure is localized. We may evaluate the non-backtracking version, b = F (t)1, from first principles. The hub node has n − 1 NBTWs of length one and no others, and each leaf node has one NBTW of length one, n − 2 NBTWs of length two and no others. So Straightforward computation shows that b 1 > b 2 for 0 < t < 2 and b 1 < b 2 for t > 2. Therefore, we observe a transition that is not present with standard exponential centrality, albeit in a parameter regime where t is greater than unity. More tellingly, for fixed t > 0 so the measure is nonlocalized.

Numerical tests
In this section we study the localization effect in a realistic setting. We used a random graph model rather than a fixed data set so that we can control the dimension, n, and study the asymptotics. We used an implementation of the Barabási-Albert preferential attachment model for undirected networks in the MATLAB toolbox CONTEST [43]. In this model, nodes are added sequentially until the desired size is reached. Each node is given, upon arrival, d links to current nodes. These target nodes are chosen independently with a probability proportional to their current degree. The resulting network will have a scale-free degree distribution, where a few nodes (hubs) have a high degree and many nodes have a very small degree. In this case, the vector of degrees is localized by construction. We have ρ(A) ≤ A ∞ , and when this inequality is sharp the presence of a high degree node forces Katz to use a small parameter α, producing results close to degree centrality [9].
We used the default setting of the toolbox, giving each node d = 2 links upon arrival to the network, and we built networks of increasing size from n = 10 to n = 1000. For each network, we computed the inverse participation ratio (16) for the normalized total communicability vector e tA 1 and its nonbactracking analogue b = F (t)1 for t = 1 fixed. In order to compute both centrality vectors, we used the MATLAB toolbox funm_kryl 2 [1], which computes the action of a matrix function over a vector using a restarted Arnoldi algorithm. We repeated our tests 1000 times.
The results are displayed in Fig. 1, where the sample average of the inverse participation ratio is plotted against n in a semilogarithmic scale. The error bar is used to display the standard error around the average values. The results are in line with the analytical results presented in Section 6-the inverse participation ratio decays towards zero for the non-backtracking version, but appears to be bounded away from zero when backtracking walks are included.

Summary
The main contributions of this work were 1. to derive analytical expressions for the non-backtracking walk exponential generating function for both undirected and directed graphs, 2. to show that it is feasible to compute the resulting network centrality measures: the key matrix involved in the computations has twice (three times) the dimension of the analogous backtracking version for undirected (directed) graphs, and has a comparable level of sparsity, 3. to interpret the new measures in the context of traditional centrality on a multilayer network, 4. to use insights from the multilayer/block matrix setting in order to derive expressions for general matrix function-based centrality measures, 5. to study how non-backtracking within exponential centrality reduces localization effects on a star graph, 6. to give further illustrative results on a controllable test network.
Our overall message is that it is both analytically and computationally attractive to study non-backtracking walk counts in a general matrix function setting that includes the matrix exponential. We therefore hope to have paved the way for future work that, for example, considers • physical interpretations of the new measures, • applications to specific fields of network science, • further links with areas of discrete mathematics, stochastic processes and quantum physics, • the development and analysis of customized numerical methods for approximating the action of a matrix function on the type of large, sparse unstructured matrix arising in network science.