Efficient recurrence relations for univariate and multivariate Taylor series coefficients

The efficient use of Taylor series depends, not on symbolic differentiation, 
but on a standard set of recurrence formulas for each of the elementary 
functions and operations. These relationships are often rediscovered and 
restated, usually in a piecemeal fashion. We seek to provide a fairly thorough 
and unified exposition of efficient recurrence relations in both univariate 
and multivariate settings. Explicit formulas all stem from the fact that 
multiplication of functions corresponds to a Cauchy product of series 
coefficients, which is more efficient than the Leibniz rule for nth-order 
derivatives. This principle is applied to function relationships of the form 
h'=v*u', where the prime indicates a derivative or partial derivative. Each 
standard (calculator button) function corresponds to an equation, or pair of 
equations, of this form. A geometric description of the multivariate operation 
helps clarify and streamline the computation for each desired multi-indexed 
coefficient. Several research communities use such recurrences including the 
Differential Transform Method to solve differential equations with initial conditions.


1.
Introduction. Taylor polynomials for function expressions, and functions given by differential equations, are most efficiently computed by recurrence relations that capture how to combine series. Incomplete glimpses of this idea are encountered in calculus, where a series for e x 2 about 0 is found by substituting into the known series instead of differentiating; in series solutions for simple ordinary differential equations (ODEs) where the goal is a recurrence relation; and in numerical ODE solutions where a higher-order Taylor series method is considered as a theoretical possibility. A complete set of recurrence relations for operations and standard functions can change the symbolic parts of these methods into practical computational algorithms. Several current research communities use series recurrences as a primary tool: Automatic Differentiation (AD) seeks derivatives or series for functions specified by a computer program (see [4], [8]); Polynomial Methods for Differential Equations was a special session of the conference of this proceedings (e.g. [9]); and the Differential Transform Method is a perspective on directly transforming an ODE (and sometimes a partial differential equation) into a recurrence relation found in some publications (e.g. [5] and the 1986 Chinese textbook referenced therein). Many of the works within these communities recreate or cite piecemeal results for operations on series. We seek a fairly thorough and unified treatment of both univariate and multivariate recurrence relations for such operations.
The core ideas of series recurrence developed over centuries. In 1247, a Chinese (root-finding) algorithm converted coefficients of a polynomial about 0 to coefficients about x 0 [10]. Euler (1707 -1783) was a master at manipulating series [3]. Franz Mertens (1840 -1926) proved convergence of the Cauchy product of series [2]. In 1962, the Ph.D. thesis of R.E. Moore used series recurrences (equivalent to many of the univariate rules herein) for error analysis in digital computing [6]. Still, a fairly complete set of univariate rules with derivations is rare [11] and multivariate rules are very hard to find, except possibly as code within specific computing languages [7]. This mathematical presentation is rigorous and self-contained. Although the univariate is a special case of the multivariate, they are treated in succession so that algorithmic ideas for both can be developed simply and then subtle features of the multivariate case can be distinguished.
2. Univariate recurrences. For every function h : R → R analytic in a neighborhood of a, we will associate the Taylor coefficient function H : The lower case and upper case notation will be used consistently. In the other direction, This can be thought of as a transform method, where (2) is the Taylor Coefficient Transform (or Differential Transform) and (1) is the inverse transform. Actually, H(k) values are rarely computed by differentiating as in (2); instead, special cases are tabulated and combinations or recurrences are used on the transform side. Specifically, Taylor coefficients of elementary functions can be built starting from only the two most basic cases: Suppose the transforms U (k) and V (k) are already known (as in the above two cases) or have been computed for is some function or operation on u and/or v, we seek H(k) in terms of U , V and/or preceding values of H. The nature of U , V and H depends on perspective. We will be specifying these by recurrence formulas. In practical application, computers form arrays of floating point doubles U = (U (0), U (1), U (2), . . . , U (m)), V = (V (0), V (1), V (2), . . . , V (m)) and H = (H(0), H(1), H(2), . . . , H(m)). In object-oriented programming, these can be instances of a class (such as the series class in [8]) and operations or standard functions applied to such objects can implement the recurrence formulas to produce the resulting Taylor coefficients. This ability to custom define the action of usual operator symbols or function names, such as * and sin, is called overloading. We now seek recurrence relations for each of the arithmetic operations and standard (calculator button) functions.
The transform of a product is the Cauchy product of transforms: over all nonnegative integers k. This sum of U and V values has been called a discrete convolution and this combination of series has been called a Cauchy product. Variations of this rule will be the source of all of our recurrence formulas. The Leibniz rule for the kth-order derivative of a product introduces binomial coefficients in each term and is less efficient than the Cauchy product. An alternative approach to this paper would apply the Leibniz rule to arrays of derivative values at a, which are larger by a factorial factor from the U and V coefficient values. Computing with such large derivative values has been seen to produce more cumulative roundoff error than using Taylor coefficient values.
The transform of a quotient follows from the product.
This recurrence relation, for nonnegative integers k, can be used to divide by any v (x) where V (0) = v(a) = 0. It may be generalized to deal with analytic functions such as sin(x) is a quotient of two power series with the coefficient values simply shifted to the left. Unfortunately, if coefficients in U and V are only known up to order m, then coefficients in H are only produced up to order m − 1. Subsequent shifts could be implemented if further terms vanish in both numerator and denominator.

Composition & the DE theorem. Consider a composition
where f : R → R is one of the standard functions found on a typical scientific calculator, including exponential and trigonometric functions and their inverses. Then For a simple v(x) = f (u(x)), we use U and generate V along with H. For example, if h(x) = exp(u(x)), then v = h and V = H. In other cases, we get one, or a system of two, differential equations of the above form although the roles of u, v, and h may be switched. Examples of this appear in Section 2.3. The point is that we will be using the specific form h (x) = v(x) u (x) resulting from a specific f . We avoid the use of a higher-order chain rule for arbitrary (symbolic) function f , the Faà di Bruno formula, which involves significant combinatorial complications. For , consider a Cauchy product involving derivatives of the corresponding series For our recurrence rules, we pull out the j = k term and divide by k: For repeated application to the differential equation and to facilitate a multivariate generalization, we encapsulate (5), without the isolated term, as a computational function ddot and state the resulting principle as a theorem.
Notice that ddot(V, U, k) uses V and U coefficients only from 1 to k − 1, which is important in order to avoid a self-referential definition of H(k), even if H plays the role of V and/or U . Recall that jU (j) comes from the derivative of u, so this "derivative dot product" is a kind of Cauchy product of v and the derivative of u. It will be convenient to use the principle with a scalar multiple: 2.3. Standard functions. Using the DE Theorem, each standard (calculator button) function results in a recurrence relation for the series coefficients when the function is composed with one whose series coefficients are already known.
If h(x) = ln(u(x)), Trigonometric functions and inverse trigonometric functions produce coupled recurrence relations from the DE Theorem. From now on, we will assume that the zero term is initialized as the function value and that the recurrence relation applies for k ∈ N.
If s(x) = sin(u(x)) and c(x) = cos(u(x)), s (x) = c(x) u (x) and c (x) = −s(x) u (x), so S(k) = C(0)U (k) + ddot(C, U, k) and Surprisingly, h(x) = sec(u(x)) seems to require three such operations. We can compute S and C for c(x) = cos(u(x)) and apply (4) to h(x) = 1/c(x). Likewise, many functions, can be built from operations above. Hyperbolic functions can be built from exp operations but sometimes it is more efficient to derive the coupled recurrence relations from the DE Theorem by mimicking the above derivations.

Exponents. For the most general exponentiation
). For each step we have recurrences using three generalized Cauchy products: If U and V are completely known, operations in exp(V * ln(U )) could be overloaded to perform three loops, one on each line the above recurrences; once computed, intermediate results can be discarded. However, for a differential equation like u = u v , once through the sequence of three recurrences would return the next series coefficient and the process would be repeated saving intermediate results.
The above uses three generalized Cauchy products, but many important special exponent cases use only one such product: For h(x) = u(x) r , the important case of constant exponent, we use only two generalized Cauchy products. Since h (x) = r u(x) r−1 u (x), we have u(x) h (x) = r h(x) u (x). Using the DE Theorem (for F (k) where f (x) = both sides of above), we get U over all multi-indices k = (k 1 , k 2 , . . . , k n ) of nonnegative integers. We rarely use the Taylor coefficient formula As in univariate, transforms of elementary functions can be built from operations on the most basic cases, where e i denotes the multi-index with 1 in the ith coordinate and zeros in all others: • If h(x) = α ∈ R, then H(0) = α and H(k) = 0 otherwise.  number of variables, many languages allow working with a multi-dimensional box array, although these can grow huge very quickly. Figure 1(a), with multi-indices ≤ (6, 3, 4) coordinatewise, represents Taylor coefficients that correspond to at most 6 derivatives in x 1 , 3 derivatives in x 2 , and 4 derivatives in x 3 . Computing up to a fixed order, as in Figure 1(b), is probably best implemented by a linear array, where each linear index corresponds to a multi-index. This is the biggest complication in multivariate work. We will focus on the conceptually simple task of finding recurrence relations to compute H(k), assuming H(j) have already been computed for all multi-indices j ≤ k coordinatewise with j = k. Processing array entries in such an order is easy, but linear indexing makes it more difficult to access needed subsets of previous values.

Arithmetic operations. The multivariate Taylor coefficient transform is also
summing over all multi-indices j ≤ k coordinatewise. This is a multivariate generalization of the Cauchy product. If k = (6, 3, 4), the sum is over the whole box of entries, shown in Figure 1(a), with U and V arrays processed in opposite order. For k = (2, 1, 2) (in Figure 1(a) or (b)), the sum would be over the 3 × 2 × 3 sub-box of entries.
summing over all multi-indices j ≤ k coordinatewise with j = k. Thus 3.2. Multivariate DE theorem. For the composition h(x) = f (u(x)) where f : R → R is one of the standard (calculator button) functions that generate the set of elementary functions, which plays the same role as h (x) = v(x) u (x) in univariate derivations. summing over all multi-indices with coordinates j p = 1, 2, . . . , k p and all other j i = 0, 1, . . . , k i except that j = k is omitted.
Proof. For k = 0, we may choose any coordinate p such that k p = 0. We suggest k p = min{k i : k i = 0}, in order to minimize the number of terms in the ddot summation that results. By hypothesis ∂h Collecting the coefficient of (x − a) k−e p on each side yields Geometrically, ddot can be envisioned by considering the sub-box of U for j ≤ k, sliced by fixed p-coordinate. Drop the first such face, then other slices are dotted with the reverse of the opposite slice of V , results are multiplied by j p and summed. For example, if n = 2 and k = (6, 3), the entries needed in V and U are shown in Figure 2. By choosing k p = k 2 = 3, we drop the largest face of this box (which could extend into even more dimensions for larger n). Entries are multiplied and summed as indicated by the arrows. When each slice is finished the result is multiplied by j p and the final total is divided by k p . The ddot operation and Cauchy product can be made very efficient by storing a large global reference list (of matrices of indices) that depends only on the number of variables n and the highest order m of desired Taylor coefficients, as in Figure  1(b). A linear index can progress through entries corresponding to each multi-index k. For each linear index, the global reference can store linear indices corresponding to the sub-box of all j ≤ k, divided into slices; making a 2D matrix of indices for each k, regardless of n. Then ddot simply runs through the matrix of indices, as diagrammed in Figure 2. Assuming the linear ordering of the n-dimensional multiindices is increasing either lexically and/or in |k| order, values on the sub-box will be computed before a value for k. This process is implemented in MATLAB in [1], where the global reference of sub-boxes was much faster to compute by recurrence, than with direct search.
3.3. Standard functions. After details are encapsulated in ddot, the multivariate DE theorem looks exactly like the univariate version in (6); With this understanding, all of the standard (calculator button) functions have the same transform rules when applied to multivariate functions. Consider h(x) = exp(u(x)) where u : R n → R and the multivariate transform U is known. In order to see how this starts, consider u(x, y, z) = xyz about (a, b, c). The basic arrays X, Y , and Z would have just two nonzero entries, like Y ((0, 0, 0)) = b, Y ((0, 1, 0)) = 1. The Cauchy product rule (8) is used twice to find U = X * Y * Z. For any u and h(x) = exp(u(x)), we have ∂h ∂xi (x) = h(x) ∂u ∂xi (x) for all i, playing the same role as h (x) = h(x)u (x) in the univariate case. Then H(0) = exp(U (0)) and all other H(k) values are given by (9), specifically Just as the only difference between (7) and (10) is that the index arguments are multivariate, all of the ddot rules of Sections 2.3 and 2.4 apply to multivariate arguments as well, and will not be repeated here. Of course, the implementation of indexing, arrays, and ddot are more complicated but the standard function rule is just the same. There is a new motivation in the multivariate case: a ddot rule (9) can be more efficient than a Cauchy product rule (8) since ddot omits an entire face from of each sum of products, as described at the end of the last section. While h(x) = u(x) 2 could use either h(x) = u(x) u(x) or h (x) = 2u(x) u (x), the latter would be preferred for multivariate work. The square root function would be done in a similar way.

4.
Application to an ODE. To see how these rules are used to compute a highorder solution to an ordinary differential equation with initial condition, we return to the univariate setting and consider the example problem y = −xy − sin(y), y(x 0 ) = y 0 . Initial values specify the transform (or Taylor coefficients) for X = (x 0 , 1, 0, 0, . . .) and the first two terms Y (0) = y 0 , Y (1) = −x 0 y 0 − sin(y 0 ). To produce further terms of Y , we break the right-hand-side of the differential equation into steps with one variable name for each product (division) or standard function and any trigonometric cofunctions as in Section 2.3. Let p = xy, s = sin(y), c = cos(y), and y = −p−s, which is called a code list in AD. The transform of each of these variables is initialized: P (0) = x 0 y 0 , S(0) = sin(y 0 ), C(0) = cos(y 0 ). The differential equation transforms to (k + 1)Y (k + 1) = −P (k) − S(k), where P (k) and S(k) are computed by Section 2.3 rules. In this case, P (k) = k j=0 X(j)Y (k − j) = x 0 Y (k) + Y (k − 1). For k ≥ 1, compute arbitrarily many terms of Y by iterating through P (k) = x 0 Y (k) + Y (k − 1), S(k) = C(0)Y (k) + ddot(C, Y, k), C(k) = −S(0)Y (k) − ddot(S, Y, k), Y (k + 1) = (−P (k) − S(k))/(k + 1), which could be simplified by substitution for P (k) in the last step. Such a sequence of recurrence relations is the Differential Transform Method solution. The solution function y(x) ≈ m k=0 Y (k)(x − x 0 ) k gives y almost to the radius of convergence for large m. Calculation for y(−1) = 2 shows the 50th order Taylor polynomial visually agreeing with the ode45 MATLAB solution for a radius of 1.5, and the 10th order for over radius 1.0. Numerical or a-priori error estimates could detect such a radius and use it as a large step to a new point, where the recurrence is restarted and another series is found. This algorithmic version of a high-order Taylor series method does not require symbolic differentiation.