COMPUTING OF B-SERIES BY AUTOMATIC DIFFERENTIATION

We present an algorithm based on Automatic Differentiation for computing general B-series of vector fields $f\colon \mathbb{R}^n\rightarrow \mathbb{R}^n$. The algorithm has a computational complexity depending linearly on $n$, and provides a practical way of computing B-series up to a moderately high order $d$. Compared to Automatic Differentiation for computing Taylor series solutions of differential equations, the proposed algorithm is more general, since it can compute any B-series. However the computational cost of the proposed algorithm grows much faster in $d$ than a Taylor series method, thus very high order B-series are not tractable by this approach.


Introduction
B-series, named after John C. Butcher, is a fundamental tool for the study of numerical integration methods [4,14]. B-series are mainly used as a theoretical tool systematising the analysis of numerical integration schemes. In recent years B-series methods have also been used more directly as a computational tool. The method of modified vector fields [6] is based on the idea that the results of a given numerical integration scheme can be improved by modifying the vector field prior to feeding it into the numerical integrator. As an example, given a differential equation y = f (y) and the second order implicit midpoint rule, one may compute a modified vector fieldf such that whenf is fed into the midpoint rule, the result is a higher order approximation to the original differential equation y = f (y). The modified vector fieldf can be expressed as a B-series in f , and the computation of f involves the computation of derivatives of f . A very special example is the method of modified vector fields applied to Eulers explicit method. In this casef is given as a Taylor series expansion of f up to the desired order. More generally, Runge-Kutta like methods based on computing terms in the B-series are called elementary differential Runge-Kutta methods [17].
Previous work on modified vector fields have mainly been applied to special dynamical systems where the derivatives of f are analytically computable up to a certain order. For more general systems the analytical computation of higher order derivatives is intractable by analytical means, due to a combinatorial explosion in the number of terms. An important alternative to analytical differentiation is the use of Automatic Differentiation (AD) techniques [12]. Such techniques are applicable whenever there is a need to compute higher order derivatives in given points, and analytical expressions are not needed. In the special case of Taylor series integration methods, AD techniques have successfully been applied to compute Taylor series up to order 50 or more [21].
The main purpose of the present paper is to investigate AD techniques and software for the computation of general B-series, and hereby opening up the applicability of modified vector field methods and elementary differential Runge-Kutta methods for general classes of dynamical systems.
1.1. Pre-Lie algebras, trees and B-series. B-series can generally be defined in terms of a pre-Lie algebra. A pre-Lie algebra is a vector space A with a bilinear product : A×A → A. The product is neither commutative nor associative, but satisfies the pre-Lie relation a(x, y, z) = a(y, x, z), 1 where a(x, y, z) = (x y) z −x (y z) is the associator of the product. Our prime example of a pre-Lie algebra is the set of vector fields on R n , denoted Ξ(R n ). This consist of all smooth functions f : R n → R n with the pre-Lie product given for f, g ∈ Ξ(R n ) as ).
An other important example of a pre-Lie algebra is the set of all rooted trees T , endowed with the product given by grafting. The grafting t 1 t 2 is a sum of all possible ways of attaching the root of t 1 to a node of t 2 , as in this example: The pre-Lie algebra {T, } is called the free pre-Lie algebra, since it has a universality property as follows [5]. For any pre-Lie algebra A and any map → f : T → A there exists a unique pre-Lie morphism F f : T → A, defined as a map such that The evaluation of all elementary differentials can be done in different ways. Based on the homomorphism property of F f , we can build all the elementary differentials from the monomial basis for the free pre-Lie algebra. The first of these are , , ( ) , ( ), ( ) ( ). The monomial basis is indexed by binary trees. Rooted trees are obtained from binary trees by a recursion known as Knuth's rotation correspondence [9]. The first elementary differentials can be obtained from Knuth's correspondence as A more conventional expression for the elementary differentials of rooted trees is directly in terms of higher order derivatives of f , as given in (2.5) below. A B-series is traditionally defined as a formal finite or infinite sum where |t| denotes the number of nodes in t, σ(t) ∈ Z is the tree symmetry function, h ∈ R is a real parameter (e.g. step size of a numerical method) and β : T → R is a given function. The parameter h can be absorbed by a re-scaling of the vector field f → hf and σ can be absorbed into β. Therefore we adopt the simplified definition Our goal is efficient computation of finite B-series, i.e. series where β(t) = 0 for all |t| > N , using automatic differentiation.
2. An algorithm for computing B-series In this section we will present a way to calculate B-series. This problem consists of multiple parts. The first is the generation of the rooted trees, for this we use the algorithm of Li and Ruskey [16]. Then in order to evaluate F f over a tree we take advantage of Automatic Differentiation (AD, see in Griewank [12]) in our algorithm. One of the most important characteristics of AD is that we are not working with the formulas for derivatives, but the values at a given point. We compute higher order Taylor coefficients of univariate functions with automatic differentiation, and through interpolation from these we obtain the necessary derivatives using the formula by Griewank, Utke and Walther [13]. Finally one has to take care of the redundancy when evaluating F f over all the trees up to a given degree. We handle this phenomenon by basing our calculation on weighted directed acyclic graphs instead of rooted trees.
2.1. Generating rooted trees. We need an efficient way to generate all the rooted trees up to degree d, taking care about the isomorphic equivalences. We use the algorithm of Li and Ruskey [16], which we will review here.
There are multiple ways to represent a rooted tree, we will use the parent array concept. Suppose that we have a rooted tree τ whose vertices are labeled as 1, 2, . . . , d. Then the ith entry of the parrent array par τ [i] is 0 if the vertex with the label i is the root, otherwise it is the label of the parent of the vertex labeled with i. In order to obtain a unique parent array associated with a given rooted tree τ , we label the vertices in the depth-first order starting from the root. See Figure 1 for a labeling of a rooted tree with 9 vertices. We need a representative element from each isomorphic equivalence class. We obtain these with the use of the parent array. The representative element from an equivalence class of the rooted trees is the rooted tree τ that has the lexicographically maximal parent array. This rooted tree is called canonic and we also say that the parent array is canonic. It is straightforward that removing the vertex with the highest label from a canonic tree with d vertices results in a canonic tree with d − 1 vertices. On Figure 2 we compare the parent arrays of two isomorphic rooted trees τ 1 and τ 2 . Since τ 2 has lexicographically bigger parent array, it is the canonic tree in the class consisting these two trees. Figure 2. Two isomorphic trees with par τ 1 = 0, 1, 1, 3 < par τ 2 = 0, 1, 2, 1 The algorithm from [16] generates canonic parent arrays up to a given size. Moreover it does so in constant time, amortized over all trees. This is the so-called CAT-property (Constant Amortized Time). The recursive procedure is based on extending a canonic tree τ of d − 1 vertices into a canonic tree with d vertices.
The vertex with label d is a leaf by construction, and comes later than the vertex labeled with d − 1 in the depth-first order. Therefore the parent of d is an ancestor of d − 1 or the vertex d − 1 itself. Proof. Suppose that the parent array par τ [1], . . . , par τ [d − 1], par τ [η] , and the respective tree τ , are not canonic. Let the canonic tree in the class of τ be τ . Consider the following operation on the tree τ . Move the vertex that corresponds to d with respect to the labeling of τ to be the child of the vertex that corresponds to η in the labeling of τ . We obtain a lexicographically bigger parent array than par τ [1], . . . , par τ [d − 1], η and this is a contradiction. For a rooted tree τ and its vertex labeled with q let T τ (q) be the rooted subtree of τ rooted at q. In Algorithm 1 we copy repeatedly the subtree with the root ρ. We store the degree of this subtree in S. After Algorithm 1 is called, it produces all the canonic parent arrays of the form par [1], . . . , par[p − 1], · . We start the generation by Gen(1, 0, 0; par = , T = ∅).
Assume now that τ is a rooted tree with d − 1 vertices and let µ be the ancestor of d − 1 that satisfies that the parent array of T τ (µ) is a prefix of the parent array of T τ (ν) where ν is the rightmost proper sibling of µ, and µ is the lowest such index. We call critical node the leftmost sibling ρ of ν that satisfies that T τ (ρ) = T τ (ν). On Figure 3 we have a tree with 16 vertices. We see that µ = 15, ν = 12 and ρ = 6. In order to see that Algorithm 1 is correct, let τ be a rooted tree with d − 1 vertices and consider the lexicographically largest infinite canonic rooted tree agreeing with τ in the first d − 1 positions of its parent array. This infinite tree may be formed by deleting T τ (µ), and replacing it with an infinite sequence of rooted subtrees, all isomorphic to T τ (ρ) and their root connected to par τ (ρ) (= par τ (ν) = par τ (µ)). Truncating this infinite tree at any node D ≥ d gives the lexicographically largest extension of D nodes. We start a new copy of T τ (ρ) in line 8 of Algorithm 1, while in line 10 we make a new isomorphic copy with offset S.

2.2.
Computing higher order derivatives. Let f : R n → R n be sufficiently smooth function such that all appearing derivatives exist and are continuous. Our goal is to evaluate higher order derivatives of f . For this purpose one may use higher order automatic differentiation techniques for multivariate functions, the reader is referred to Berz [3], Danis [8]. We shall present here a different method described in Griewank, Utke and Walther [13].
Fix the integer p ≥ 1 for the time being. We shall not denote explicitly that the dimension of a quantity is dependent on p, it is rather straightforward to use the formulas with different p-s later on.
Let i = (i 1 , . . . , i p ) ∈ N p 0 be a multi-index with the norm |i| defined as |i| = p r=1 i r . The multi-indices i and j satisfy j ≤ i if the relation is satisfied componentwise. Consequently j < i is true if j ≤ i and j = i stand. We denote by 0 and 1 the multi-indices that contain only zeros or ones respectively. Naturally a multi-index is a real vector in R p as well, therefore we may use them in the standard algebraic operations.
Let s r ∈ R n be real vectors for all r = 1, . . . , p. The S ∈ R n×p matrix that has the column vectors s j , S = s 1 ; . . . ; s p is called the seed matrix.
Our goal is to evaluate ∇ d S f (x), that is the d-th derivative tensor of f (x + Sz) with respect to z at z = 0. This means that we have to obtain the partial derivatives of the form at t = 0. The quantities γ(i, j) where i and j are multi-indices are defined as follows Now we can state the main formula for higher derivative tensors from [12] and [13] that we used in our implementation. If d ≥ |i| > 0, then According to the analysis done in [13], the operation count of this propagation of multiple univariate Taylor series in comparison with the operation count of the propagation of multivariate derivatives is essentially the same for moderate degrees and considerably fewer for higher degrees.

2.3.
Computing F f over a rooted tree τ at x. The quantity F f (τ ) is a vectorfield. Note that we want to obtain its value F f (τ )(x) at the point x ∈ R n . For the tree with one vertex we have Assume that the rooted tree τ is a root connected to the subtrees: τ 1 , . . . , τ p . In this situation the recursive formula for F f (τ )(x) is written as Having isomporphic trees We may consider formula (2.5) as a higher derivative tensor and obtain the closed expression .
We can convert this into the form of (2.3) by introducing i = (1, . . . , 1) ∈ N p + and s r = F f (τ r )(x) for r ∈ {1, . . . , p}. However, if τ has symmetries on the first level, that is there are identical -up to isomorphism -trees between τ 1 , . . . , τ p , exploiting this will result in a more efficient computation.
As the recursive definition shows, in order to evaluate F f over the rooted tree τ , first we have to evaluate the children of the root and for that, their children etc. Let us assume that τ has d vertices and those are V(τ ) = {v 1 , . . . , v d }. Let i be a permutation of {1, . . . , d}. The ordering {v i 1 , . . . , v i d } of the vertices is called post-ordering if for every vertex v i j it is true that if v i k is a child of v i j , then i k < i j . In order to have simpler notations let us assume that {v 1 , . . . , v d } is already in post-order. Note that this implies that v 1 is a leaf, and v d is the root.
Recall that T τ (q) denotes the rooted subtree of τ rooted at q. Consider the following sequence of rooted trees T τ (v 1 ), . . . , T τ (v d ). When computing F f (T τ (v j ))(x), because of the post-ordering of the vertices, the rows of the seed matrix are amongst the vectors F f (T τ (v k ))(x) with k < j. Therefore we may evaluate the values in this order, using equations (2.3) and (2.7). Observe that T τ (v d ) = τ , thus we have a way to compute F f (τ )(x).

2.4.
Computing F f over multiple rooted trees. Our goal now is to evaluate F f over all the rooted trees up to degree d. Consider that the list T of these rooted trees is ordered as follows. The rooted tree τ 1 ∈ T preceeds the rooted tree τ 2 ∈ T if the degree of τ 1 is smaller, or if they have the same degree, then if the parent array of τ 1 is lexicographically smaller. It is obvious that if τ is a rooted subtree of τ , then τ comes before τ in the ordering.
We wish to evaluate F f over all the trees in the ordered list T . We shall progress from smaller order to higher order, thus we start with the tree with one vertex , obtaining F f ( )(x) = f (x). In the rest of this section when we consider a rooted tree τ ∈ T , we always assume that it has the following structure. The subtrees connecting to the root are τ 1,1 . . . τ 1,i 1 ; . . . ; τ p,1 . . . τ p,ip and τ r,1 , . . . , τ r,ir are in the same equivalency class with the canonic rooted tree τ r for all r ∈ {1, . . . , p}. As we have seen with the post-ordering, we may evaluate F f (τ )(x) with applying equation (2.3) once, if F f is known for the first level subtrees already. It is obvious that we don't have to repeat the evaluation recursively vertex by vertex, instead we may proceed through the ordered list T . If we have evaluated F f over the first k trees already, then doing the computation for the (k + 1)th tree is one application of the formula (2.3).
We may represent every rooted tree in T as a vertex in a weighted Directed Acylcic Graph (wDAG) in the following way. Let G be a wDAG with vertices V and edges E ⊆ V × V. We denote the weight of an edge e by w(e). The weight of the vertex v ∈ V is the sum of the weights of the edges leaving v, that is u)).
We say that G represents the ordered list T of rooted trees if there exists a bijective function ι : T → V such that if τ ∈ T , then (2.9) {v ∈ V : (ι(τ ), v) ∈ E} = {ι(τ r ) : r ∈ {1, . . . , p}} and w((ι(τ ), ι(τ r ))) = i r for all r ∈ {1, . . . , p}. On Figure 4 we added the wDAG generated by our program representing the rooted trees with degree less than 6. It is evident that G has only one sink, that is only one vertex with no outgoing edge and this is ι( ). Define the function ∆ : V → R n as follows, We may progress with the evaluation of ∆ over the vertices of G in a post-ordering of V. It is obvious that ∆ is computable and for any τ ∈ T we have (2.11) F f (τ )(x) = ∆(ι(τ )).
2.5. Computational cost. An important aspect of our algorithm is that the dimension n of the vector field f enters linearly into the cost of the evaluation of the B-series, thus the algorithm is tractable also for high dimensional vector fields. The operation cost of evaluating a d'th order univariate Taylor-coefficients is of the order O(d 2 ) according to Griewank [12], thus the d'th order Taylor coefficient ∂ d /∂z d f (x + zs) is found in only O(nd 2 ) operations. This compares favorably with the exponential growth observed for the symbolic differentiation of a general function. Griewank, Utke and Walther [13] analysed the formula (2.3) and found that the number of nonvanishing coefficients Given a seed matrix, this number bounds the number of operations needed to calculate all partial derivatives from the univariate Taylor series. As seen in Plotkin and Rosenthal [19] and Finch [11], if T d is the number of nonisomorphic rooted trees with n vertices, then where r = 0.3383218568... is the unique positive root of the equation F (x, 1) = 0 for Recall that a rooted tree is represented in the wDAG as a vertex with the same weight as the number (d) of the children of the root and with as many children as many (p) non-isomporphic children the root has.
We do not have a precise expression for the total cost of evaluating B-series up to order d with the present algorithm. However, assuming that we need at most T d different seed matrices, we believe the cost is asymptotically bounded by O(nd 2 T d ) < O(nd 5/2 ) operations.
2.6. Implementation. We have written our program in C++ using the automatic differentiation library FADBAD++ by Bendtsen and Stauning [2] and the CAPD library [7]. For the representation of the graphs we have used the Boost Graph library [20] and stored the graphs in Graphviz format [10]. The source code is available at [1].

Conclusion and future directions
We have presented a method to efficiently compute B-series using automatic differentiation. While repeatedly using formula (2.3), it becomes apparent, that during the evaluation of the wDAG described in Section 2.4, we have to calculate with the same seed-matrix or submatrix in certain cases. Note that it is also recommended to precompute the values γ(i, j) in advance, possibly with higher accuracy. These gives space for further improvements, that we plan to address in the future.
Whereas computation of d-order B-series for n dimensional vector fields is in general impossible for high d or high n using symbolic derivations, the present algorithm has a complexity which is only linear in n and polynomial in d and thus applicable in a range of practical situations, at least for moderately high order d.
However, an order d B-series is still substantially more expensive to compute than a order d Taylor expansion, using AD. The reason for this is that the Taylor expansion has a sparse representation in the monomial basis, represented by the B-series (f f ))) . . . .
In other words, the Taylor series is obtained by a repeated differentiation with repspect to t of the function f (y(t)), where y (t) = f (y(t)). Thus, the B-series of Taylor expansions can be computed substantially more efficient than general B-series. An alternative to our computational approach, which may be more efficient also for other B-series being sparse in the monomial basis, might be to use (1.1) as a basis for an AD algorithm. The algorithmic details and possible computational advantages of such an approach is subject to future research.