KIOPS: A fast adaptive Krylov subspace solver for exponential integrators

This paper presents a new algorithm KIOPS for computing linear combinations of $\varphi$-functions that appear in exponential integrators. This algorithm is suitable for large-scale problems in computational physics where little or no information about the spectrum or norm of the Jacobian matrix is known a priori. We first show that such problems can be solved efficiently by computing a single exponential of a modified matrix. Then our approach is to compute an appropriate basis for the Krylov subspace using the incomplete orthogonalization procedure and project the matrix exponential on this subspace. We also present a novel adaptive procedure that significantly reduces the computational complexity of exponential integrators. Our numerical experiments demonstrate that KIOPS outperforms the current state-of-the-art adaptive Krylov algorithm phipm.


Introduction
Exponential integration received increased attention recently as an efficient alternative strategy to standard methods for solving systems of ordinary differential equations (ODE).Exponential integrators have the advantage of being accurate and, similarly to implicit methods, possess good stability properties, allowing integration with large time steps.These methods involve computation of an exponential or an exponential-like function of a Jacobian matrix or an approximation to it (e.g.see review articles [1,2]).Approximation of such matrix functions or their products with vectors constitutes the main computational cost of an exponential integrator.Typically it is the latter approximation, i.e. product of exponential-like matrix functions with vectors, that is required for an implementation of an exponential method.
A number of methods have been proposed to calculate the exponential or the exponential-like functions of a matrix or their product with a given vector (see [3] for a review).Most of them, however, are of little practical use for large-scale stiff matrices due either to high computational cost or numerical stability issues.These challenges are described in review papers of Moler and Van Loan [4,5].Other algorithms that are more suitable for large stiff matrices [6,7] require some information about the norm or the spectrum of a matrix.It is common, however, that the matrix in question is not given in the explicit form and only matrix-vector products can be calculated.This is the case, for instance, when the matrix is a Jacobian that results from a complicated spatial discretization of a system of partial differential equations (PDE).As a result of these considerations, a conclusion can be drawn that the Krylov subspace projection-based techniques are among the most promising methods for problems where exponential or exponential-like functions of a large stiff matrix have to be computed and little or no information can be elicited about this matrix a priori.
The basic idea of the Krylov subspace approach is to project the exponential of a large matrix onto a relatively small Krylov subspace where calculating the exponential is significantly less computationally expensive.Recent progress in computational linear algebra has led to efficient Krylov subspace algorithms such as the EXPOKIT software of Sidje [8], restarted Krylov methods [9][10][11][12], block Krylov subspaces [13][14][15][16], time-parallel methods [17,15], the shift-and-invert acceleration [18][19][20] and the adaptive methods [21,22].The phipm adaptive method of Niesen and Wright [21] has been shown to be the most efficient option for the problems under consideration [23].
For the large scale geophysical fluid dynamics problems that motivate our study, it was found by Clancy and Pudykiewicz [24] that semi-implicit predictor-corrector schemes [25] where still more efficient.Gaudreault and Pudykiewicz [26] further analyzed these results and concluded that the computational cost of the Arnoldi procedure [27] was the origin of the issue.The performance was then improved by using an incomplete orthogonalization instead of the Arnoldi procedure based on full orthogonalization.This technique has been successfully used for the simulation of the shallow water equations on the sphere with second and third order exponential propagation iterative (EPI) schemes [28].
The incomplete orthogonalization procedure (hereafter denoted as IOP) was originally proposed by Saad as an eigenvalue algorithm for general non-symmetric matrices [29] and to solve systems of linear equations [30].The application of IOP to approximating the matrix exponential was studied recently by Koskela [31] and by Vo and Sidje [32].
The aim of this article is to explore this technique further with a particular focus on the efficient calculation of ϕ-functions within exponential integrators.We call the resulting algorithm the Krylov with Incomplete Orthogonalization Procedure Solver (KIOPS).This new method has been carefully designed to allow for an efficient implementation of single or multi-stage exponential integrators, such as those recently proposed by Rainwater and Tokman [33].
The paper is organized as follows.In Section 2 we describe the main application of KIOPS method, i.e. evaluation of linear combinations of ϕ functions within exponential integrators.Section 3.1 presents a theorem that shows a way to evaluate a linear combination of the ϕ-functions used in our KIOPS algorithm.The KIOPS algorithm is presented in Sections 3.2-3.7 where we describe how the method uses Krylov subspace projection with incomplete orthogonalization and a new adaptivity procedure.Our numerical experiments in Section 4 validate the performance of the proposed solver and demonstrate its relative efficiency compared to phipm.Conclusions are presented in Section 5.

Approximating ϕ-functions within exponential integrators
The main application of KIOPS and other algorithms that approximate products of ϕ-functions and vectors is their use within an exponential integrator designed to solve initial value problems for large scale stiff systems of ordinary differential equations (ODE) of the form Differential equations of this form arise in many contexts in the natural and social sciences and engineering disciplines.
In most applications, the independent variable t usually represents time, N is the number of degrees of freedom, the vector-valued function u(t) represents some unknown dynamical quantities and f is a vector-valued function describing all forces driving the system.Exponential integrators gained much interest in recent years as an efficient alternative to implicit schemes in integrating stiff systems and a number of exponential methods have been introduced (e.g.[34][35][36][37][38][39][40]28,41], see also review [2]).When any exponential integrator is applied to a complex nonlinear problem its main computational cost is the evaluation of the exponential matrix-functions vectors products.The new KIOPS technique applies to any exponential integrator that employs approximation of these products.To make the computational savings of KIOPS concrete, however, in this paper we will focus on a class of exponential propagation iterative methods of Runge-Kutta-type (EPIRK).These schemes are particularly designed to gain efficiency by exploiting the properties of an algorithm used for the evaluation of ϕ-functions [23,33].Specifically, EPIRK schemes proposed in [23,33] are particularly efficient when used with an adaptive Krylov method [42,43].
To illustrate what are the computational costs associated with approximating ϕ-functions within an exponential integrator we consider a three-stage EPIRK method that can be written as is the nonlinear remainder of the second order Taylor expansion of f (u) and ψ ij (z) are linear combinations of exponential-like functions defined as Constant coefficients α ij , β ij and p ijk can be derived using either classical or stiff order conditions and methods of up to order five have been proposed in [44,33].Note that since we are interested in problems where J n ∈ R N×N with N >> 1, the largest per-time step computational cost of (2) lies in evaluating the matrix function-vector products ).Thus the most efficient methods should have coefficients p ijk , α ij and β j that both reduce the number of ψ ij (g A)b terms required as well as make each of these evaluations as computationally cheap as possible.A strategy to achieve this goal was proposed in previous publications [23,33] and involves exploiting the structure of the adaptive Krylov projection algorithm phipm [21].
The phipm algorithm evaluates linear combinations where The phipm algorithm proceeds by considering (4) as a solution to the initial value problem The algorithm evaluates solution to (5) over k subintervals 0 ≤ τ 1 ≤ (τ 1 + τ 2 ) ≤ ... ≤ k l=1 τ l = τ using a substepping procedure.Each substep i is comprised of two main parts.The first part consists of computing the approximation where t i+1 = i l=1 τ l and the vectors w j are calculated recursively from the relation The second part of each substep is the adaptive algorithm that selects the substep size τ i+1 and the dimension m i+1 of the Krylov projection to be used for the next substep.To approximate the substepping procedure should be performed over the whole interval τ ∈ [0, 1].Note that if only a single ϕ q (τ A)b q is involved in the linear combination (4) (i.e.b j = 0 for j = q) then the substepping procedure can be modified to compute several terms of type ϕ q (T i A)b q for several values of T i simultaneously.This is accomplished by ensuring that the stops at each τ = T i are made and the result is recorded and scaled by 1/T q i .Efficient EPIRK method (2) is derived when the terms ψ ij (g A)v can be combined into a minimum number of groups so that all such terms in each group are evaluated using only one execution of the adaptive Krylov projection algorithm.For example, a stiffly accurate fourth-order scheme EPIRK4s3 [45,43] (9) requires only two calls to phipm.One call is used to evaluate both terms ϕ 1 1 8 h n J n h n f (u n ) and ϕ 1 1 9 h n J n h n f (u n ).Note that for this approximation we need to only substep solution (4) over the interval τ ∈ [0, 1   8 ].The phipm approximation allows us to compute the linear combination (4) which involves coefficients τ i .The terms we need in (9) do not have these coefficients.Since the terms in (9) involve only single ϕ 1 term with all other b i = 0 (i = 1), this problem can be easily remedied by multiplying the results of phipm by factors 1/τ i .Such fix would not work if several ϕ i 's were involved unless we need the final value of τ be 1.This is precisely the case for the second call to phipm to evaluate the linear combination in the last stage of (9).
The second and last call to phipm is used to evaluate the linear combination for τ = 1.This linear combination involves all exponential terms involved in approximating u n+1 .Clearly the factors 1/τ i do not pose a difficulty in evaluation of this linear combination since we substep phipm over τ ∈ [0, 1].
Thus the structure of this exponential integrator utilizes phipm for two types of tasks: (I) Computing several terms of type ϕ k (τ A)b k involving a single function ϕ k and a vector b k for several values of τ ; (II) Approximating linear combinations (4) with τ = 1.
Both higher-order higher-stage-number EPIRK schemes as well as other exponential integrators like exponential Rosenbrock methods [41] utilize similar structural properties to construct and implement exponential time integrators efficiently.
We have modified the original version of the phipm to enable these two options of computing either a single ϕ q (τ A)b q or the whole linear combination (4) for multiple values of τ = T i .In the modified phipm, we use the Algorithm 3 described in section 3.3.1 to keep track of intermediate solutions and we scale the final results by 1/τ i whenever task I is executed.The rest of the procedure is identical to the one described in [21].In the subsequent text every mention of phipm refers to this slightly modified version of the original phipm algorithm.
For general problems where matrix A is not explicitly available and little is known about its spectrum, phipm has been demonstrated to be the most efficient algorithm to date for the implementation of exponential integrators [23].This algorithm has, however, significant drawbacks and the following three considerations were the main driving force behind our search of alternative techniques.
• The convergence of the phipm algorithm is often inconsistent.For example, it may happen that computing a solution with a small error is significatively faster than computing a solution with a lower accuracy.We also noticed that this behavior may occur alternately when more and more accurate solutions are being calculated.As we will see later in the section presenting our numerical experiments, this results in zig-zags in a precision diagram.
• The substepping of equation ( 5) requires several explicit multiplication by the matrix A in Eq. (7).As discussed in [21], [22] and [26], this is not only costly, but the procedure might also become increasingly sensitive to rounding errors as p increases.Since the last stage of a high-order EPIRK scheme often involve multiplications by relatively large coefficients, this sensitivity to rounding errors could be worrying if A is large.
• The adaptive procedure of phipm is based on somewhat rough estimates of the floating point operations count on a single processor machine.Important details related to a specific computer architecture are not taken into account.

The KIOPS algorithm
The KIOPS algorithm, outlined in Algorithm 1, builds on the ideas of phipm, but modifies both the substepping procedure and the adaptive algorithm to significantly improve efficiency and accuracy of approximating (4).Specifically: • Instead of substepping equation ( 5) we extend theoretical result of Sidjie [8] to linear combinations of ϕ-functions (4) and use exponential of an augmented matrix to compute all terms τ i ϕ i (τ A)b i simultaneously using one Krylov projection.
• We use incomplete rather than full orthogonalization procedure for Krylov projections.
• We propose a different adaptivity method to select τ i and Krylov subspace sizes m i which brings more efficiency to the overall approximation of (4).
Below we provide a detailed description of the KIOPS algorithm and demonstrate its efficiency and accuracy within exponential integrators compared to phipm using a set of numerical examples.

Computing linear combinations of ϕ-functions
As mentioned before, the phipm requires several explicit multiplications by A in Eq. ( 7).Hence, it becomes increasingly sensitive to round off errors as p increases.The alternative approach presented in this section allows to evaluate all terms of the linear combination (4) simultaneously and thus completely replaces the substepping of (5).
Once again consider the task of approximating where In a typical application, the matrix A is large and sparse.Exact evaluation of individual ϕ-functions and vector products is then prohibitively computationally expensive.We choose a more efficient method to compute the linear combination ( 12) using a single exponential of a slightly larger matrix [8,6,22].The following theorem is an extension of the result obtained in [8] for the case b i = c for i = 0, ..., p. Theorem 1 shows how the problem of computing ( 12) can be solved by computing a single exponential of a matrix.Hereafter, we denote by I l the l × l identity matrix, e p = (0, ..., 0, 1) ∈ R p is the last canonical basis vector in R p .The colon operator "a : b" is an operation that generates the indices ranging from a to b.This operator is used to represent a subset of the elements from a vector or a matrix and it has the lowest priority.
and let w = e τ Ã v.Then, the first N elements of the vector w are given by w(1 and the rest of the vector is Proof.Since Ã is block upper triangular, its exponential has the form From [8], we have and the columns of E 12 are given by the Theorem 2.1 of Al-Mohy and Higham [6], with = 0, as Inserting Eqs. ( 17) and ( 18) into Eq.( 16), we obtain the following matrix It only remains to multiply (19) by v = b 0 , e p to obtain the vector w. 2 Now we will use this result inside an iterative procedure to evaluate e τ Ã .

Krylov adaptive method
The KIOPS algorithm is structured around the idea of utilizing Theorem 1 to approximate using the exponential of an augmented matrix.This matrix exponential can be computed using a polynomial approximation of the form where P m−1 is a polynomial of degree m − 1.There are a number of methods that employ a polynomial approximation of the form of Eq. ( 21) including truncated Taylor series approximation, Leja interpolation, and Chebyshev polynomials-based algorithms.The disadvantage of most of these methods is that they require information about the spectrum or norm of the matrix.As mentioned earlier obtaining this extra information about the matrix can be impossible or prohibitively expensive computationally especially for problems where the action of the matrix-vector product is given by an external "matvec" subroutine.In this work, we consider only approaches that do not require any knowledge about the norm or spectrum of the matrix.We also avoid methods that require inversion of Ã because this matrix is singular.
Since the approximation ( 21) is an element of the m-dimensional Krylov subspace the problem can be recast into the search for an element of K m that approximates w(τ ) [46].The approximation of the vector w(τ ) by an element of a Krylov subspace is made up of two important steps.The first step is the computation of an appropriate basis for the Krylov subspace and of a smaller matrix that represents the projection of the action of Ã on this subspace.Then, in the second step, the matrix exponential of the smaller matrix is computed using a standard technique and the result is projected back onto the original large space.Notice that the definition (22) does not include the factor τ since the Krylov subspace associated with Ã and τ Ã are the same for any τ ∈ R.
We will see in section 3.4 that if we want to obtain good accuracy, then the size of the Krylov space m has to be large when τ Ã is large.This is worrying because it may indicate that an impractical amount of memory storage and computational cost could be necessary to obtain a sufficiently small error.A more efficient approach is to apply the Krylov subspace method iteratively as in the work of Sidje [8].The key idea is that computing the action of the matrix exponential is equivalent to solving a linear ODE to split τ into a sum of smaller intervals, such that Thus, the problem of computing the matrix exponential is equivalent to finding a solution for the initial value problem From this viewpoint, one could iteratively solve the recurrence w(τ i+1 ) = e τ i Ã w(τ i ) (28) and use an adaptive procedure in order to control the errors of the method.
Given a function that computes the product Av of a matrix A ∈ R N×N and a vector v ∈ R N each substep τ l the KIOPS algorithm proceeds according to the following steps: (i) Given current substep size τ l and the size of the Krylov subspace m l an incomplete orthogonalization-based Krylov projection algorithm is executed to compute e τ l Ã defined in (19) (this step is described in detail in sections 3.3 and 3.3.1).(ii) The local error estimate is computed and the approximation from step (i) is accepted for the current interval only if the error is within user-specified tolerance (see sections 3.4, 3.5).(iii) If the approximation in step (ii) is accepted, the adaptive algorithm then determines whether it is more cost efficient to compute the approximation at the next substep with a larger value of τ l+1 or a smaller Krylov subspace size m l+1 .If the approximation in step (ii) was rejected, the adaptive algorithm decides whether it is more cost efficient to obtain a better estimate by reducing the substep size τ l or increasing the Krylov subspace size m l (see sections 3.6.1 and 3.6.2).
As seen in the previous section, an exponential integrator typically requires either evaluation of w at several values of Algorithm 1 KIOPS: Evaluate linear combination (12).
(p−1)! , . . ., T now , 1 14: Compute the local error estimate as in section 3.4.20: Calculate suggested τ new and m new as in section 3.6.

21:
Choose to change m for m new or τ for τ new .

22:
if Acceptance criterion described in section 3.5 is satisfied then 23: w, = Algorithm 3 (T , H , w, ) { Update the w vector } 24: To simplify the notation in the subsequent sections we will refer to the substep size τ l as simply τ and the Krylov substep size m l as m.Keep in mind, however, that these values change over each substep l.

Building a basis for the Krylov subspace
The first part of the Krylov subspace method consists of computing a set of basis vectors {v 1 , . . ., v m } for the m-dimensional Krylov subspace K m .This is equivalent to finding a matrix whose column space is K m .In a typical implementation, the Arnoldi procedure is used to compute this matrix.As discussed in [26], performing the Arnoldi procedure takes O (m 2 • [N + p]) operations and constitutes the primary computational cost of the Krylov subspace projection technique.For this reason we turn to the incomplete orthogonalization procedure of length 2 whose time complexity is O (m Starting with the vector v 1 = v/ v , the incomplete orthogonalization procedure (Algorithm 2) produces the factorization where e m = (0, ...0, 1) ∈ R m denotes the last canonical basis vector in R m .The most important by-products of this factorization are the matrix V m ∈ R (N+p)×m and H m ∈ R m×m .The matrix H m has a banded structure and it can be seen as an oblique projection of the action of the matrix Ã on the Krylov subspace.The entry h m+1,m ∈ R can be interpreted as a kind of residual of the projection onto the Krylov subspace and it will enter into the formulation of an error estimate in section 3.4.Although we want to generate a basis of an m-dimensional space, the algorithm produces m + 1 vectors in general.The last vector v m+1 ∈ R (N+p) will not be used in our approximation scheme.There exists alternative corrected scheme [46] that makes use of this vector, but our experiments showed that it has slightly slower convergence than the standard scheme when used in this solver.
The main difference between Algorithm 2 and an equivalent algorithm based on Arnoldi procedure is that each new vector is orthogonalized only against the two previous ones instead of all of them.Hence, the matrix V m obtained after Algorithm 2 has rank m and its columns span K m , but they do not form an orthonormal basis in general.
Algorithm 2 Incomplete orthogonalization procedure of length 2.

12:
if s ≈ 0 then 13: happy_breakdown = true 14: break 15: end if 16: 18: end while 19: return V , H , j It is worth noting that Algorithm 2 does not require the assembly of the augmented matrix Ã.In effect, the product Ãv j is formulated, at the lines 4 to 6, using block multiplication involving only the action of A and the matrix B. This formulation allows the action of the matrix A, or some approximation of it, to be provided by an external subroutine.Typically, such subroutine tries to exploit the sparsity patterns of A to reduce the computational cost associated with the matrix-vector product.

Approximation of the exponential in the Krylov subspace
Since columns of V m obtained using IOP do not form an orthonormal basis, V T m V m is not an identity matrix as is the case for the Arnoldi procedure.We can nevertheless use arguments similar to those presented by Saad [46] and by Higham [3] to obtain a theorem that will justify our approximation scheme.
Theorem 2. Let Ã ∈ R (N+p)×(N+p) , v ∈ R (N+p) and let V m ∈ R (N+p)×m , H m ∈ R m×m be the result of m steps of Algorithm 2. Then for any polynomial P j of degree j < m − 1 we have P j ( Ã)v = v V m P j (H m ) e 1 (31) Proof.Without loss of generality, the proof can be done by induction on j for the polynomials of the form P j ( Ã) = Ã j .For j = 0, we have that Ã0 Assume that ( 31) is true for all j ≤ m − 2, then and using the fact that h m+1,m v m+1 e m H j m ( v e 1 ) = 0 whenever j ≤ m − 2, we obtain that This theorem, combined with the Taylor series definition of the exponential suggests the use of the following approximation scheme for the matrix exponential: Since in general m (N + p), the matrix exponential e τ H m can be computed using any standard method with dense output [4,5].In this work, we use a diagonal Padé approximation combined with a scaling and squaring algorithm [47].
Algorithm 3 presents the procedure to calculate the solution for multiple stepsizes.

Error estimates
The approximation error for the exponential of a negative semidefinite matrix is bounded as in the following equation [31] (see also [32] for a comprehensive analysis of the convergence of IOP) β, (34) where α( Ã) = max{Re(λ i )} denotes the spectral abscissa of Ã (i.e. the supremum among the real part of the eigenvalues of Ã), m is the size of the Krylov subspace, β = v and Ã is the matrix 2-norm.This a priori estimate is useful to gain insight on the convergence of the approximation scheme, but it cannot be used in practical calculations because it requires knowledge about the spectrum of τ Ã.
A more practical a posteriori estimate can be derived analogously to the Theorem 5.1 of Saad [46].This error estimate will be useful to formulate an acceptance criterion in section 3.5.The basic idea is that the error of the scheme (33) satisfies the expansion If we assume that the magnitude of the terms of the series decreases rapidly as the solution converges, then the absolute value of the first term m = τ h m+1,m e m ϕ 1 (τ H m )β e 1 (36) can be used as an estimate of the error.This estimate can be computed with little extra cost.Invoking Theorem 1 again, the τ e m ϕ 1 (H m )e 1 term can be obtained along with e τ H m in a single matrix exponential.To do that, we define the following

Hm =
H m e 1 0 0 (37) and evaluate its exponential The term τ e m ϕ 1 (H m )e 1 corresponds to the entry (m, m + 1) of the resulting matrix.It remains only to multiply by β h m+1,m to obtain the estimate (36).
Although no serious problems were found in this study, we note that this error estimate is less accurate when m is large (see also the discussions [46] and [8]).Using a sharper error estimate or an appropriate residual notion (see e.g.[8,12,48]) could lead to better performance and accuracy.We intend to analyze this in detail in future studies.

Acceptance criterion
For an iteration to be accepted, it must satisfy a user defined tolerance.To verify this, we compute the scaled error ω = T end m τ Tol (39) as in Niesen and Wright [21].The step is accepted if ω <= δ, where δ = 1.4 is a criterion intended to reduce the risk of rejection of the step.Since the adaptive procedure described in section 3.6.1 seeks to get a solution with a value of ω less than 0.9, there is little to no risk to allow the tolerance to be slightly exceeded occasionally.An iteration is rejected if ω > δ.
The adaptive procedure can then extend the existing Krylov subspace basis with more vectors or use a smaller stepsize to improve the current solution.This flexibility to improve a missed iteration is an important difference with Krylov solvers for linear systems and eigenvalue problems where a bad iteration can wreak havoc.

Selection of parameters
The accuracy and the efficiency of our algorithm depend on the two important parameters: the size of the Krylov space m i and the substep size τ i .Thus their values must be chosen with care.
To start the algorithm, we use a somewhat optimistic first guess of τ = T end and a user provided estimate m = m init .
A default value of m init = 10 is used if no other value has been specified.However, experiments have indicated that a good estimate of the Krylov subspace size can drastically improve the performance of the method [26].When the solver is used repetitively to integrate an ODE over many time steps, a possible strategy is to use the final value of m l from the previous time step as a first guess for the next one.This heuristic is justified when the nature of the problem to be solved in a particular time step resembles closely that from the previous step.Hence, the algorithm also returns the size of the Krylov subspace m l used in the last iteration as an output parameter.A solution is then produced with these parameters.If the acceptance criterion is satisfied, then our adaptive procedure will determine if m or τ could be enlarged safely for the next substep.If, on the contrary, the acceptance criterion is not satisfied, then we must choose different parameters m or τ and try again.The challenge is then to find optimal parameters to avoid step rejection and for quick convergence.
After each iteration (accepted or rejected), the adaptive procedure suggests a stepsize τ new and a dimension of the Krylov subspace m new as in Algorithm 4. Two scenarios can then be considered and the most economical is selected: (A) keeping m constant and changing τ to τ new , (B) keeping τ constant and changing m to m new .
Starting at a given substep and assuming that the new choice of parameter leads to acceptance for all of the subsequent steps we can see that the cost to complete the remaining steps of Algorithm 1 is bounded below by In (40) cost iop (m) is the cost of making m steps of the Algorithm 2 (the incomplete orthogonalization procedure) that can be estimated as and cost exp (N mult , m) is the cost of computing the dense matrix exponential given in [47] as where nz( Ã) denotes the number of nonzero elements in Ã and N mult is the number of multiplications required for the scaling and squaring and Padé algorithms [47] that was selected for this study.
We notice that the cost of completing the algorithm with scenario (B), C (τ , m new ), is almost always cheaper than C (τ new , m), the cost of scenario (A).Whenever possible, our basic strategy is therefore to keep τ constant and use the method described in section 3.6.2 to let m vary up to convergence or to m max .It is only when m max has been reached that we should consider varying the stepsize to avoid using excessive amount of memory.In the case where m max is reached, we check if the step is going to be accepted.If it does not satisfy the acceptance criterion, then the procedure given in section 3.6.1 will suggest a new stepsize that is more likely to satisfy the error tolerance.In this case, we use a safety factor γ = 0.6 to reduce the risk of another rejection.If the tolerance is satisfied, then the stepsize is changed using the default value of γ = 0.9 to give the possibility to use either a smaller or a larger size in the next step.
Our approach to adaptively propose new stepsize and the dimension is largely inspired by the work of Niesen and Wright [21,22] and it is to be outlined in the next subsection.Vary m as in section 3.6.2and keep τ constant 19: return τ new and m new

Variable stepsize
Each interval must be small enough to reduce the norm of the matrix to a level where the problem could be solved with a reasonably large Krylov subspace.On the other hand, it should not be too small because this would necessitate a large number of iterations.The challenge is therefore to find an optimal partition of the interval [0, τ ] into k subintervals τ l (l = 1, ..., k) without any a priori knowledge of the matrix characteristics.
The procedure to determine the stepsize is similar to the adaptive strategy used in many ODE solvers.We assume that the error is approximately C τ q+1 , for some constants q, C ∈ R. The order q is set to q = m 4 − 1 for the first step as in [21] and if a previously suggested stepsize is rejected, then we use the following estimate to obtain a new order estimate where the "old" subscript denotes the rejected estimates.
The suggested stepsize is then given by τ new = τ γ ω 1 q+1 (44) where ω = T end m /(τ tol) denotes the scaled error and γ is a safety factor.Following [8], we chose a value of γ = 0.9, except when the size of the Krylov subspace m reaches a certain limit m max in which case we use γ = 0.6 to avoid step rejections.The suggested stepsize is always clipped to restrict its variation around the current stepsize between τ 5 and 5τ and to make sure it does not exceed the final value of t end .

Variable dimension of the Krylov subspace
As mentioned before, a rapid convergence could be obtained if we choose a large value of m when τ Ã is large.Since we do not know the norm of τ Ã, we can use the a posteriori estimate to increase m if the estimated error is too large or to reduce it if the error can be kept small enough [21].In practical applications, it may also be judicious to select a minimum and maximum size m min and m max to cap the memory requirements and to make efficient use of the CPU cache memory.
The default values, used in our numerical experiments, are m min = 10 and m max = 128.
The estimate (34) suggests that the error is about C κ −m for some constants C , κ ∈ R. For the first suggestion of a step, a default value κ = 2 is used.When an iteration is rejected, the value of κ is estimated as a function of the error estimate from the current and the previously rejected error estimates (again denoted with a "old" subscript) by Given this estimate, we obtain the suggested dimension size with The dimension of the Krylov space is constrained to vary upward or downward in the user specified interval [m min , m max ].
Similarly to what we did for the stepsize, the dimension is clipped to avoid variation larger than 25% below the previous value of m or 33% above.
As an aside, we note that there is a small inconsistency in the methodology to limit the evolution of m and τ in phipm.
This algorithm uses maximum and minimum thresholds to avoid large variations of the parameters, but the clipping of the values is done at the end of an iteration.The consequence is that an optimal decision taken on the basis of the cost function may be affected later, leading to suboptimal adaptivity.This is in contrast with KIOPS where the clipping of the suggested values is always done during the adaptive procedure.

Avoiding rounding errors
We close this section with a remark about implementation details that could help to reduce the impact of the rounding errors due to finite precision arithmetic.As shown in Theorem 1, the exact value of v for entries N + 1 to N + p is given by , . . ., (T now + τ ), 1 . ( We can also follow the suggestion of Al-Mohy and Higham [6] and normalize the B matrix.To do so, we substitute B by ν • B and change entries N + 1 to N + p of the starting vector v to μ • v(N + 1 : N + p) before applying Algorithm 2. The normalization constants are defined as a power of 2 to avoid the introduction of rounding errors and has no effect on the solution in exact arithmetic.
Our experiments have shown that the normalization of the B matrix leads to slightly faster convergence on most problems.

Numerical experiments
We evaluate the performance of the KIOPS algorithm in the context of the following fourth-and fifth-order EPIRK exponential methods since these schemes were shown to provide most efficiency in previous studies [33,45,43].The numerical experiments presented below both validate the performance of the KIOPS algorithm and demonstrate how this algorithm can improve both efficiency and accuracy of the EPIRK and other exponential schemes compared to the adaptive Krylov algorithm phipm from [21].
• EXPRB5s3 [49] -stiffly accurate fifth-order method that was originally derived as exponential Rosenbrock integrator and can also be written as an EPIRK scheme [33]: All of these methods are implemented as described in section 2 in a way that groups terms to minimize the number of KIOPS calls needed with each call optimized.A more detailed description of this so-called mixed implementation technique to optimize performance can be found in [33].With the mixed implementation the fourth-order methods require only two calls to KIOPS or phipm routines per time step, while fifth-order methods require only three calls.This is accomplished by simultaneously evaluating the second terms in the right-hand side of equations for U n2 and U n3 using a single call to KIOPS or phipm.For fourth-order methods the second call to KIOPS or phipm evaluates the full linear combination of ϕ-functions in evaluation of u n+1 .The fifth-order methods require the third call to KIOPS or phipm to approximate the third term in the right-hand side of U n3 .All schemes are implemented with constant time step integration.The phipm algorithm used in our experiments allows for simultaneous evaluation of linear combinations of ϕ's at values T = [T 1 , ..., T end ].Further details and information on our phipm implementation can be found in [21,23,33].
As in previous publications [33], we choose the following test problems routinely used to study the performance of stiff integrators.In all of the problems presented below the ∇2 term is discretized using the standard second order finite differences.
Advection-Diffusion-Reaction (ADR) 2D.Two-dimensional advection-diffusion-reaction equation [51]: Gray-Scott 2D.Two-dimensional Gray-Scott [54] with periodic boundary conditions: 1D Semilinear Parabolic.One-dimensional semilinear parabolic problem [55]: with homogeneous Dirichlet boundary conditions and for x ∈ [0, 1] and t ∈ [0, 1].The source function is chosen such that U (x, t) = x(1 − x)e t is the exact solution.where N and h correspond to the number of spatial discretization nodes and the time step size respectively.The tolerance is set to 10 −14 for KIOPS and phipm.The error is defined as the discrete infinity (maximum) norm of the difference between the approximation to the solution and the reference solution computed using MATLAB's ode15s integrator with absolute and relative tolerances set to 10 −14 .
First, it is important to note that KIOPS outperforms phipm in all of the simulations delivering both better efficiency and accuracy.For a given tolerance the speedup of simulations can be a factor of 5 or 7 (e.g.Fig. 4(e), Fig. 3(e)).It is interesting to observe that while comparatively EPIRK4s3 is more accurate than EPIRK4s3A and EXPRB5s3 is more accurate than EPIRK5P1 for the phipm implementation, KIOPS makes these schemes on par with each other.In other words, a KIOPS EPIRK4s3A implementation is as efficient and accurate as EPIRK4s3, and similar conclusion holds for EXPRB5s3 and EPIRK5P1.It is particularly notable that this phenomenon occurs even between a classically (non-stiffly) accurate EPIRK5P1 and the stiffly accurate method EXPRB5s3.Practice shows that stiff order conditions on exponential integrators can sometimes be unnecessarily strict for some problems [33].This result raises a question of whether improvements in approximating ϕ-vector products can allow for easing strict stiff order conditions and enable derivation of more efficient methods.
To demonstrate that the computational advantage of KIOPS compared to phipm is retained as the size of the problem and consequently its stiffness increase we present numerical experiments for each of the problems with different values of the problem size N in Fig. 5.It is expected that as N increases performance of both algorithms will suffer since Krylov subspace projection is at the core of each of these methods and its performance is affected by the more spread out spectrum of the Jacobian matrix.However, graphs in Fig. 5 indicate that the computational advantage of KIOPS compared to phipm is retained and even improved as the problem size N increases.

Conclusions
We presented a new KIOPS algorithm for evaluating products of exponential and exponential-like ϕ-functions of large stiff matrices and vectors.To date, phipm was considered to be the most efficient algorithm for problems where no additional information about the spectrum or norm of the matrix is available.Our results demonstrated that the new KIOPS algorithm outperforms phipm.The efficiency of the proposed algorithm is attributable to a combination of the incomplete orthogonalization procedure, a better adaptivity procedure and a heuristic strategy to determine the initial size of the Krylov space using information from previous time substeps.The new algorithm offers not only better computational efficiency but new pathways to further improvements in making exponential and exponential-type integrators more computationally appealing.In particular, the adaptivity algorithms within the KIOPS method can be improved further if better error and cost estimates are derived.We plan to pursue this line of research in our future work.We note that alternative techniques, like restarted Krylov subspace or block Krylov subspace, could be combined with the adaptive method described in this paper.We intend to study them in future work.We also plan to investigate effective ways to improve the performance of the algorithm on parallel architectures.

Code availability
The EPIC package implements the exponential integrators used in our numerical experiments.It is available from http://faculty.ucmerced.edu/mtokman/#software.
1. Minor errors were present in Eqs. ( 18) and ( 19) of [1]: "and the columns of E 12 are given by the Theorem 2.1 of Al-Mohy and Higham [6], with l = 0 and noting our reverse ordering of the columns in B, as usually in cases of w composed of a single ϕ j -function) or over the interval [0, τ = 1] (Task II).When several values of τ are needed, they are provided as an argument to Algorithm 1 in the array T = [T 1 , . . ., T end ].Algorithm 1 then returns several linear combinations of the form