$TimeEvolver$: A Program for Time Evolution With Improved Error Bound

We present $TimeEvolver$, a program for computing time evolution in a generic quantum system. It relies on well-known Krylov subspace techniques to tackle the problem of multiplying the exponential of a large sparse matrix $i H$, where $H$ is the Hamiltonian, with an initial vector $v$. The fact that $H$ is Hermitian makes it possible to provide an easily computable bound on the accuracy of the Krylov approximation. Apart from effects of numerical roundoff, the resulting a posteriori error bound is rigorous, which represents a crucial novelty as compared to existing software packages such as $Expokit$ (R. Sidje, ACM Trans. Math. Softw. 24 (1) 1998). On a standard notebook, $TimeEvolver$ allows to compute time evolution with adjustable precision in Hilbert spaces of dimension greater than $10^6$. Additionally, we provide routines for deriving the matrix $H$ from a more abstract representation of the Hamiltonian operator.


Time Evolution in a Generic Physical Quantum System
We consider a generic physical quantum system. At a given time t, all information about it can be encoded in a state vector v(t), i.e. knowledge of v(t) allows to predict the outcome of any conceivable measurement. All states that a system can possibly assume are contained in a Hilbert space H. According to our understanding of physics, our Universe is deterministic in the following sense. The state v(0) ∈ H at some initial time uniquely determines the states v(t) ∈ H at all other times. A central task in physics is to infer v(t) from the knowledge of v(0).
In principle, this is very easy. The only additional ingredient that one needs is a Hamiltonian H, which is a selfadjoint and time-independent operator on the Hilbert space H. 1 Given v(0) and H, it follows from the Schrödinger equation that the state at any other point in time t can be computed as v(t) = e −iHt v(0) .
As said the vector v(t) represents a full solution, i.e. it completely determines the system. Thus, finding techniques for calculating the r.h.s. of eq. (1) bears great relevance for all physical systems. In some situations, analytic methods and approximations are available for performing this computation. For a generic system, however, the only way to compute the r.h.s. of eq. (1) is to employ a numerical approach. Then v(0) is a vector with d complex entries and H is a Hermitian d x d matrix, where d is the dimension of the Hilbert space H. Each column of H encodes to which states a given basis state can transit. Typically, this number is much smaller than d, i.e. H is a sparse matrix. Thus, one has to compute the product of an exponentiated anti-Hermitian sparse matrix and a vector. Exaggerating slightly, one could say that solving any physical system reduces to this numerical task.

Krylov Subspace Methods
Of course, a plethora of approaches exists to exponentiate a matrix. But the special nature of the problem (1)especially the fact that in the end knowledge of the full matrix e −iHt is not required -allows the use of techniques that are significantly more efficient than generic routines for matrix exponentiation. Particularly important are Krylov subspace methods, which were first used in 1986 for the purpose of numerical time evolution [2]. We shall briefly sketch them now, before we review them in more detail in section 2.1.
The m-dimensional Krylov subspace K m is defined as where m d. (Typically, one has m 100.) The key idea is to project the large sparse matrix H (in our case the Hamiltonian) on the small subspace K m : Since H m represents a mapping from K m to K m , it corresponds to a small m x m matrix. Now the crucial step is to effectively replace e −iHt → e −iH m t , i.e. to use instead of the exact result v(t) = e −iHt v(0) the Krylov approximatioñ where V m stands for the embedding of K m in the full Hilbert space H. In doing so, we introduce the error Once matrix exponentiation is restricted to the small subspace K m as shown in eq. (5), it can be performed very fast using standard methods. In the full Hilbert space H, only matrix-vector multiplications, as displayed in eq. (2), need to be carried out. Thus, the effectiveness of Krylov subspace methods is due the fact that no matrix-matrix multiplication has to be performed in the large Hilbert space H. We can estimate how large the advantage of Krylov subspace methods is. Clearly, a necessary requirement for the feasibility of the calculation is that all intermediate results can be stored in (the memory of) a computer. It turns out that this condition, and not runtime, indeed is the limiting factor in many practical applications. If one uses standard matrix exponentiation, the biggest intermediate object is the matrix e −iHt , which unlike H is no longer sparse. So the required storage capacity scales as the number of entries, i.e. d 2 . In contrast, the largest entities that are needed for the Krylov method are the basis elements of the Krylov space (2), so the required memory scales as m d. In practice, this leads to a gain in Hilbert space size of several orders of magnitude.
The Krylov subspace (2) is specifically adapted to the initial state v(0). Therefore, one can expect that for short enough times t, the error (6) is small whereas it grows large once the state v(t) (as determined without any approximation) deviates sufficiently from v(0). Thus, it becomes necessary to restart the procedure after a certain time interval, i.e. to takeṽ(t), computed by the Krylov method, as initial state for a new Krylov subspace. With such a time stepping procedure in place, it has become evident since the earliest numerical studies that in many circumstances the Krylov method works surprisingly well already for values of m as small as 10 (see e.g. [2,3,4,5]).
At the same time, these and subsequent investigations have lead to significant conceptual progress in understanding how these Krylov subspace techniques work. A possibly incomplete list of relevant studies includes [3,4,5,6,7,8,9,10,11,12,13], where it is important to mention that these publications have a wider scope than the present paper in that they deal with the exponentiation of a generic sparse matrix. In the works cited above, a particularly relevant object of study is to provide bounds on the error (6). In general, one can distinguish between a priori and a posteriori error bounds. The first ones can be evaluated without actually performing the Krylov technique while the second ones rely on the result of the Krylov method. Correspondingly, a priori error bounds tend to be more general (and in particular independent of the initial state v(0)) whereas a posteriori error bounds are often sharper.
However, both types of rigorous error bounds typically suffer from the problem that the function bounding the error is difficult to evaluate in practice. Therefore, a posteriori error estimates have been introduced in [2,4]. They play a twofold role in numerical implementations of Krylov subspace methods. First, they are used to determine the time interval after which restarting of the Krylov procedure becomes necessary. Secondly, they can be used to estimate the error of the final result. Obviously, using error estimates instead of error bounds leads to the problem that one can in general not be sure if the outcome of the Krylov method is indeed close to the true result.

Practical Computation of Time Evolution
As we have discussed, Krylov subspace methods represent an efficient approximation scheme for calculating the product of an exponentiated sparse matrix and a vector. Such a problem appears in a wide range of context. In particular, it is relevant for computing time evolution in a generic quantum system, as displayed in eq. (1). In this case, the sparse matrix −iHt, which is exponentiated, is anti-Hermitian. From the perspective of Krylov methods, this represents a special case, but in a physical context this encompasses the most general quantum system.
In order to actually compute the result of time evolution in a given physical system, a numerical implementation of Krylov subspace methods is required. This crucial tool was provided by Sidje in 1998 when he published the software package Expokit [1]. 2 Among other routines for matrix exponentiation, this program provides Krylov subspace methods for generic sparse matrices. Since certain simplifications occur for (anti-)Hermitian matrices, Expokit contains functions that are specific to this special case, which is extremely important in the context of physics. In line with our previous discussions, Expokit does not employ a rigorous error bound, but it relies on the error estimate proposed in [4] and subsequent refinements developed in [6,7].
The lack of a rigorous error bound represents a serious drawback for any practical application. If the problem at hand is such that one can compute the result by means of a different method, then one can of course check the validity of the error estimates. However, in such a case there is no need to employ Krylov subspace techniques in the first place. This is to say that Krylov approximation methods are only interesting if there is no other way of solving the problem. But in this case it is impossible to make sure that the error estimates are still viable. Therefore, the numerical results obtained from Krylov subspace methods would become significantly more meaningful if a more precise statement about the error, ideally a rigorous error bound, were available.
Since the release of Expokit, much progress has been made in the study of a posteriori error bounds and corresponding criteria for choosing the size of the time step [11,12,13,14,15,16,17,18,19]. To our knowledge, Lubich was the first one to point out that significant simplifications occur in the physically relevant case in which an anti-Hermitian matrix is exponentiated: The fact that the matrix norm of e −iHt is 1 leads to the existence of an a posteriori error bound, which can be represented as an integral over quantities that are already known within the Krylov approximation [14]. In this way, one obtains a formula for computing the error bound that is both rigorous and easy to compute in practice. Of course, it is important to remark that a rigorous formula for bounding the error of the Krylov approximation still does not lead to a fully rigorous statement about the accuracy of a numerical result. The reason is that a finite machine precision inevitably causes additional roundoff errors. Nevertheless, reducing the uncertainty in the error to effects of finite machine precision certainly represents an improvement.
The goal of the present work is to incorporate this knowledge, and in particular the error bound proposed in [14], in a concrete numerical implementation of Krylov subspace methods. To this end, we have developed the software package TimeEvolver, which is able to compute numerical time evolution while obeying an adjustable error bound. It is rigorous up to effects of roundoff errors and a negligible uncertainty due to a finite step size in the evaluation of the error integral. Moreover, we provide an estimate for the numerical roundoff error. Since we specialize to anti-Hermitian matrices from the outset, our program is considerably less general than Expokit. However, it is sufficient for the most general application of quantum mechanics, namely time evolving a generic physical system. In turn, we have aimed at making our program easy to use and to modify. Correspondingly, TimeEvolver is open-access and based on free software. In this way, we hope to make Krylov subspace methods easily accessible for physicists from a wide range of fields.
The outline of this paper is as follows. Section 2 starts with a more detailed review of known Krylov methods. We pay particular attention to a posteriori error bounds and also present our estimate for the numerical rounding error. Subsequently, we describe our concrete algorithm. Section (3) discusses the practical problem of deriving the Hamiltonian matrix from a more abstract Hamiltonian operator. We specialize to the widely-used situation in which the basis of Hilbert space is given by number eigenstates. Section 4 contains a description of TimeEvolver, i.e. the concrete implementation of our approach in C++. In section 5, we apply our program to an exemplary physical system and study its performance. We summarize in section 6. Throughout, we try to be self-contained and not assume specific prior knowledge. In this way, we hope to make our presentation also understandable to a physicist without previous experience in the application of Krylov subspace techniques.

Krylov Subspace
Following [3,4,5], we shall review known Krylov subspace methods. The first step is to compute a nested basis of the Krylov spaces (2). This can be achieved using the Arnoldi algorithm (see e.g. [20] for a review). It simplifies for the special situation of Hermitian H, which we consider, and in this case is also known as Lanczos algorithm. With the initialization we calculate for j = 1, . . . m: where we use modified Gram-Schmidt orthogonalization since it is more stable numerically (see e.g. [21,22,20] where e m is the unit vector with entry in the m th component. This implies that i.e. H m is the projection of H on the subspace K m , in line with eq. (3). So far, all statements have been exact. Now we implement the approximation (4), which amounts to using H m instead of the full H. Thus, the approximate result of This coincides with eq. (5) since V T m v(0) = e 1 . Here and in the following we assume that the initial state vector v(0) is normalized: ||v(0)||= 1. The key point is that unlike H, the Hessenberg matrix H m is small. Therefore, any standard algorithm can be used to exponentiate it without significant computational cost.

Previous Work on Error Bounds
Next we shall discuss bounds on the error, err(t) := v(t) −ṽ(t), i.e. the difference ofṽ(t) and the result v(t) of an exact time evolution (see eq. (6)). A series representation of it was derived in [4]: where Φ 1 (z) = (e z − 1)/z and Φ k (z) can be defined recursively (see [4]). It was proposed in [4] to use the first term of the series as error estimate: 3 Additionally, the suggestion was made to further simply eq. (13) by replacing Φ 1 (−iH m t) with e −iH m t , which leads to [4]: The error analysis employed in Expokit is based on the series representation (12). 4 Following the proposals [6,7], both the first term (shown in eq. (13)) and the second term of the series (12) are used to compute an error estimate [1]. An important advantage of this approach is that it can also be applied in the general case when H is not Hermitian. A complementary approach for studying the error of the Krylov method was suggested in [14] and subsequently investigated in [15,16,17,18,19]. One can start from the observation that err(t) fulfills a simple differential equation [11,12,14,15,16,17,18,19]. In order to derive it, we first note that where we used (9) in the second step. Therefore, it follows that For the initial condition err(0) = 0, this differential equation is solved by In order to find an upper bound for the norm of err(t) from (17), one has to evaluate ||e −i(t−τ)H v m+1 ||. For a generic, i.e. non-Hermitian, matrix H this is very hard since it involves the exponential of a large matrix. Calculating it is as difficult as the initial problem (1) we set out to solve. Thus, formula (17) makes evident the typical difficulty of rigorous error bounds: It is difficult to compute the bounding function numerically. This problem disappears, however, when H is Hermitian. Then ||e −i(t−τ)H v m+1 ||= ||v m+1 ||= 1 and the bound eq. (17) can be computed explicitly. It is very interesting that the case of a Hermitian matrix H, which is of extraordinary relevance in physics, leads to such simplifications. We arrive at the error bound [14,18,19] The key point of this formula is that it only depends on quantities that are known in the Krylov algorithm or easy to determine: The element h m+1,m is already computed in the Arnoldi procedure and e T m e −iH m τ e 1 can be calculated without any significant computational cost since H m is already known and -as said before -the time needed to exponentiate the small matrix H m is negligible.
The only remaining question is how the integral in eq. (18) can be computed in practice. In [14] the proposal was made to approximate it by evaluating the integrand at a small number of points. For example, the right-endpoint rectangle rule represents an alternative way to arrive at the estimate (14) while the Simpson rule is expected to yield a more accurate approximation [14]. Additionally, it was suggested to consider the limit of a small step size, in which the integral can be performed analytically [19]. Evidently, a major drawback of such approximation methods is that they introduce an unquantifiable uncertainty in formula (18) so that it no longer yields a rigorous error bound.

Improvement of Error Bound
Whereas Expokit relies on the series representation (12) for estimating the accuracy of the Krylov approximation, we shall in the present work use the integral (18) for bounding the error. For computing it numerically, we will employ well-known double-exponential integration methods [23]. In our case of a one-dimensional continuous function on a finite interval, they are known to achieve high accuracy at moderate computational costs (see e.g. [24,25,26]). Dynamically adapting the number of points at which the integrand is evaluated, we can largely avoid uncertainties that would result from replacing the integral (18) by simple approximations, such as the ones considered in [14]. In this way, inaccuracies due to the one-dimensional numerical integration become negligible as compared to other sources of error, which we shall discuss shortly.
In general, we can distinguish two reasons why the Krylov method discussed in the present work does not yield an exact result. The first one, which we have discussed so far, is due to the Krylov approximation itself, i.e. the replacement of the full Hamiltonian H by its projection H m on the Krylov subspace (see eq. (3)). We can call this error "analytic" since it would arise even if all calculations were performed with infinite machine precision. As shown in eq. (18), one can provide a rigorous formula for computing a posteriori bound on this analytic error. In any numerical procedure, however, there is an inevitable second source of inaccuracy caused by a finite machine precision -one can label the resulting error as "numerical". We will not bound it rigorously but only provide an estimate.
Generically, the effect of a finite machine precision is most significant when a large number of numerical operations is applied consecutively (see e.g. [21,22]). Correspondingly, we expect the Arnoldi/Lanczos algorithm to be most important in the determination of the numerical error. The rounding error in the Lanczos procedure has been studied intensely and rigorous bounds were derived [27,28,29,30]. It follows from these analyses that the order of magnitude of the numerical error can be bounded in terms of the two quantities where ||H|| is the 2-norm of the matrix H and is the machine precision. Moreover, |H| denotes a matrix obtained from H by taking the absolute value of each element. A priori the two number in eq. (19) are independent but a common bound for both of them is given by where ||H|| 1 denotes the 1-norm of H and we used Hölder's inequality for a symmetric matrix. We shall use the quantity (20) as an estimate for the numerical error.
In summary, we develop further existing approaches to error analysis in Krylov methods, and in particular the implementation in Expokit [1], in two ways. First, we use an exact formula for computing the error resulting from the Krylov approximation. It is displayed in eq. (18) and can be evaluated in practice with only negligible additional computational effort. Apart from an insignificant inaccuracy due to a finite step size in numerical integration, formula eq. (18) would lead to a rigorous a posteriori error bound if machine precision were infinite. Secondly, we give an estimate for roundoff errors, which result from finite machine precision. It is shown in eq. (20). We must mention, however, that our approach is only applicable to the special case of a Hermitian matrix H and therefore significantly less general than the method employed in Expokit.
Finally, we remark that formula (18) does not only give a rigorous formula for computing an upper bound on the error of the Krylov approximation for a given time interval t. It can also be used to find the optimal step size for a given desired accuracy. As explained above, the step size denotes the time after which the Krylov procedure is restarted, i.e. a new Krylov subspace is computed. The input data is the time t, at which we wish to evaluate v(t), and an upper bound err max on the error. This determines an upper bound on the error rate tol rate as follows: Now we can evaluate the error formula (18) at different times. The optimal step size t step is the largest time for which the resulting error rate is still below the desired bound: This allows us to use a minimal number of steps while obeying a given bound err max on the error.

Pseudocode
We shall discuss more concretely our implementation of the algorithm sketched so far. It is displayed in pseudocode 1 and consists of repeating three steps: 1. We perform the Arnoldi algorithm as shown in eq. (8). We remark that instead of H, we use the matrix A = −iH for computing the Krylov subspace. Consequently, the Hessenberg matrix is anti-Hermitian. Otherwise, this part is completely standard. The only aspect we have not gone into so far is the special situation of a "lucky breakdown". Namely it can occur that the Krylov space K m , as defined in eq. (2), has a dimension that is smaller than m. In this case, an exact analytic result of orthonormalization, as displayed in line (8d), would yield a vanishing h j+1, j for some value of j. In the numerical implementation, this corresponds to h j+1, j being below some small threshold set by the machine precision. In such a case working in the Krylov subspace is no longer an approximation but yields an exact result. This means that our problem is solved and we can stop the algorithm. We shall not further discuss this special case. 2. We find the optimal step size according to eq. (22) by computing the integral in the error formula (18) for different times t step . We employ a double-exponential method [23] in the form of a tanh-sinh transformation of the integral, which is thereafter evaluated by trapezoidal quadrature. In the concrete implementation, it has proven efficient to proceed as follows. In all iterations but the first one, we can use the previous step size as estimate for t step , where we slightly reduce its value. Then we compute the resulting error according to the integral (18) and compare it to tol rate · t step . If it is larger, we half t step and repeat the procedure until the error is small enough. By using a slightly smaller value than the previous step size as initial estimate for t step , we achieve that usually no halving of t step is required. Finally, we increase t step in small substeps as long as the resulting error is still small enough. 3. Using the step size t step determined above, we compute v(t step ) according to eq. (11). At this point, we restart the Krylov procedure, i.e. we go back to step 1, where v(t step ) now is the initial vector for the next iteration of the algorithm.
For clarity of presentation, a number of additional functionalities and special cases are not shown in the pseudocode 1: • It is important to point out that the fraction of runtime required for step 2 is generically small since computations are only performed in the small Krylov subspace of dimension m. Nevertheless, one can gain a slight improvement in performance by diagonalizing the Hessenberg matrix H m at the beginning of step 2 (see e.g. [14]). If this is done once, no matrix exponential is needed any more to compute the integrand of (18) at different values of its argument. We use this approach in our implementation.
• In the first iteration, we have no previous step size which we could use as estimate for the next one. Therefore, we slightly modify step 2 as follows. With an arbitrary initial guess for t step , we compute the error. If it is small enough, we repeatedly double t step until the next doubling would make the error too large. Then we proceed with the displayed procedure, i.e. we first keep halving t step if the error is too big and subsequently increase t step in small substeps.
• At the end of step 2, we compare the error computed according to eq. (18) with the estimate (20) of the numerical error. If the latter is bigger, the program triggers a warning to alert the user that the analytic error bound may be spoiled by finite machine precision.
• The above algorithm can trivially be extended not only to compute the final vector but also to sample its values at intermediate points of time: In step 3, one uses the known matrices H m and V m to compute the vector at any t s with t now ≤ t s ≤ t now + t step .

Creation of Hamiltonian Matrix
So far, we have shown how time evolution can be computed once the Hamiltonian matrix H is known. A problem that one encounters in practical applications is that the matrix H is usually not given. Instead, the physical system and the Hamiltonian are defined in more abstract terms. In order to demonstrate how the matrix H can be derived in such h := no; end end //Step 2: find optimal step size f (t) := h |e T m · exp(M · t) · e 1 |; t step := 0.97 t step ; err step := integrate f (t) from 0 to t step ; while err step > tol rate · t step do t step = t step /2; err step := integrate f (t) from 0 to t step ; end s = t step /N SUBSTEPS; n s = 0; ∆ err = 0; while err step + ∆ err < tol rate · (t step + n s * s) do err step = err step + ∆ err ; n s = n s + 1; ∆ err = integrate f (t) from t step + n s * s to t step + (n s + 1) * s; end t step = t step + (n s − 1) · s; ω := exp(M · t step · e 1 ); //Step 3: perform step w = V · ω; t now := t now + t step ; end Algorithm 1: Krylov-method a situation, we shall focus on the widely-used description of physical systems in terms of number operators. We shall briefly review it and then discuss how H can be derived. Every physical system can be described as collection of interacting oscillator modes. We take their number to be L and assume that L is finite. 5 Each mode l possesses creation and annihilation operatorsâ † l ,â l , where l = 1, . . . , L. In the case of bosons, they fulfill the algebra and lead to the number operatorsn Now we can use the creation operators to form a basis of states: where |0 is the vacuum and the numbers (n 1 , . . . , n L ) characterize the occupation number of the different modes: n l |n 1 , . . . , n L = n l |n 1 , . . . , n L .
In the non-relativistic limit, the total occupation number, is conserved. In this case, not all tuples (n 1 , . . . , n L ) are admissible and the set of states (25) is finite. Finally, the Hamiltonian can be expressed as an operator in terms of the creation and annihilation operators. For example, the free Hamiltonian for a system of L modes is given byĤ where ε l are arbitrary real numbers. In order to be able to apply TimeEvolver, one now needs to derive the Hamiltonian matrix H from an Hamiltonian operatorĤ, which is expressed in terms of creation and annihilation operators. As a first step, we order the basis elements, i.e. we introduce an arbitrary but fixed bijective function ind, which maps the admissible tuples (n 1 , . . . , n L ) on an index set I = {1, . . . , d}. For each basis element |n 1 , . . . , n L , we now apply the Hamiltonian operatorĤ, which in general leads to a superposition of basis elements: Here c (k) are a complex numbers. Then we define for each non-zero c (k) : where At last, the H i, j define the elements of the Hamiltonian matrix. Finally, we remark that the above procedure can be straightforwardly generalized to Hamiltonians that are defined in terms of other operators than creation and annihilation operators. To this end, one only needs to adapt the commutation relations (23) and the mapping (25) of basis states and operators. Finally, superselection rules, such as number conservation in our case (see eq. (27)), need to be taken into account in the enumeration of basis states.

Description of Program
We have implemented our method in C++. Our program uses the following libraries. The core module relies on the Intel Math Kernel Library (MKL) 6 for linear algebra operations. 7 Moreover, Boost 8 is used for numerical integration. These two libraries are the only necessary prerequisites for our program. Outside of this core functionality, we employ Hierarchical Data Format version 5 (HDF5 ) 9 for convenient output of data. Finally, CMake 10 is used to build our program.
Our implementation of the Krylov subspace method is contained in the class krylovTimeEvolver in the folder "core". Its constructor requires all arguments that are needed for time evolution, in particular the Hermitian Hamiltonian matrix and the complex vector of the initial state, where the corresponding data types are defined in matrixDataTypes. Algorithm 8 is implemented in the method timeEvolve, where Step 1, i.e. the Arnoldi algorithm, is encapsulated in arnoldiAlgorithm and Step 2, i.e. the determination of the maximal step size compatible with the desired error bound, is performed in findMaximalStepSize.
For evaluating the error integral (18) in findMaximalStepSize numerically, we rely on an adaptive implementation of a double-exponential method provided by Boost. It increases the number of points, at which the integrand is sampled, until an estimate of the relative error in evaluating the integral, which is incorporated in the library, falls below 10 −3 . If this cannot be achieved with 15 refinements, the program triggers a warning. In certain cases, this sophisticated integration procedure can have a negative impact on runtime. Therefore, we have also implemented a second integration method based on a non-adaptive Gauß-scheme (see e.g. [24].), which the user can choose to use. This simpler and faster approach could be particularly useful for explorative investigations. If interesting effects are found, one can establish the validity of results with the more accurate double-exponential method. We remark, however, that the effect of integration on runtime becomes negligible for large Hilbert spaces (d 10 5 ) so that in this case there is no disadvantage in always using the more precise approach.
All core functionalities of TimeEvolver are contained in the krylovTimeEvolver-class. If users already have at their disposal a Hamiltonian matrix, no parts of our program other than the class krylovTimeEvolver are relevant for them. However, for the convenience of the user who does not yet have a Hamiltonian matrix, we have also added the folder "helper", which contains routines to form it. The class basis provides a representation of basis vectors in terms of number eigenstates, as described in section 3. In particular, its creator forms all basis elements for a given number K of modes and a given total occupation N. Moreover, it supports the case in which some of the modes have a given fixed maximal occupation number.
The class hamiltonian contains a representation of a Hamiltonian in terms of creation and annihilation operators and provides the important routine createMatrix, which creates a Hamiltonian matrix according to the approach described in section 3. Since the function ind is inverted frequently (see eq. (31)), we represent it by a hashtable, which can perform this operation in constant time. We remark that the routine createMatrix can be easily adapted to other choices of basis. This makes it possible to apply it to Hamiltonians that are not defined in terms of creation and annihilation operators.
Finally, we have included the folder "example", which demonstrates the usage of TimeEvolver for a particular class of Hamiltonians. These Hamiltonians were investigated in the study [31], which relies on the TimeEvolver for numerical time evolution. 11 The particular structure of the Hamiltonians is reflected in the class exampleHamiltonian. The routine main contained in the class of the same name shows how all parts can be integrated and the TimeEvolver is used. Results of this routine are shown in section 5.1. 6 https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl.html 7 We choose MKL because it provides LAPACK and sparse BLAS routines in one package as well as for performance reasons. 8 https://www.boost.org/ 9 https://www.hdfgroup.org/solutions/hdf5 10 https://cmake.org/ 11 More precisely, the prospect of using the TimeEvolver in the study [31] inspired us to create it.

Exemplary System
We will demonstrate the usage of TimeEvolver on the specific Hamiltonian used in [31], which we shall briefly describe. The system consists of two subsystems of quantum oscillators. The first set is comprised of the two modeŝ a 0 andb 0 . We label their occupation numbers as n 0 and m 0 . The second subsystem includes the modesâ k andâ k , where k = 1, . . . , K and k = 1, . . . , K . These oscillators are assumed to be qubits, i.e. their occupation numbers n k and n k are restricted to the values 0 and 1. In total the system features L = 2 + K + K modes. We can denote a number eigenstate as |n 0 , m 0 , n 1 , . . . , n K , n 1 , . . . , n K .
As will become evident, the system is constructed in such a way that the occupation numbers in the two sets of oscillators are conserved independently. Thus, the two numbers do not change in the course of time evolution. Apart from this property, the particular structure of the system is not important for the present study. For completeness, we shall nevertheless discuss it in more detail and outline its physical motivation. The concrete Hamiltonian is given byĤ with where F i (k, l) = √ 2(k + ∆k i ) 3 + √ 7(l + ∆l i ) 5 mod 1. Moreover, we set ∆k 1 = ∆k 2 = 1, ∆k 3 = K + 1, ∆l 1 = ∆l 3 = K + 1 as well as ∆l 2 = 1. The HamiltonianĤ depends on the real constants m , C 0 , N c , ∆N c , C m , K as well as K , which fulfill N c > 0, 0 < ∆N c < N c , K > 0 and K > 0. As initial state, we use where we introduced N 0 and N m as additional parameters. They denote the number of particles in the modeâ 0 and the number of qubits that are set to 1, respectively. In accordance with eq. (33), they moreover determine the number of particles in each of the two subsectors of the system. Unless specified otherwise, we use the following choice of parameters: Finally, we shall briefly explain why it is interesting to study the system (34), thereby elucidating the origin of the concrete Hamiltonian. If the reader is only interested in the application of TimeEvolver, they can skip this part. The original motivation comes from black hole physics. The corresponding line of research was initiated in [32] and a complete list of references can be found in [31]. A generic problem in black hole physics is that exact calculations beyond any semi-classical limit are very hard to perform. Therefore, it is interesting to look for analogue systems which share important properties with black holes, but which are easier to study. Then one can try to draw conclusions about a black hole by solving the analogue system. An important property of a black hole is its Bekenstein-Hawking entropy [33], because of which it has a high capacity of information storage. This motivates the study of generic quantum systems with an enhanced memory capacity.
Thus, we start from the question how a generic quantum system can achieve a high capacity of information storage. We can quantify the latter by a microstate entropy K, which is defined as the logarithm of the number of available states. Two ingredients are sufficient. First, one needs K modes. The number of states corresponding to different occupation numbers of them (as defined in eq. (25)) scales exponentially with K. For example, if occupation numbers are restricted to 0 and 1, then K qubits leads to 2 K microstates and hence an entropy on the order of K. Secondly, all microstates must be nearly degenerate in energy. If this is not the case, they cannot be counted as microstates that belong to one and the same macrostate.
Both of these requirements are fulfilled by the simple system [34] It corresponds to a part of the Hamiltonian (34), and all definitions are as there. The qubitsâ 1 , . . . ,â K are responsible for information storage, i.e. generating the microstate entropy K -we can call them memory modes. If the modeâ 0 were absent, however, the states corresponding to different occupation number of the memory modes would not be degenerate in energy. Indeed, there would be a large difference of K m between the energies of the empty and of the full state. This changes onceâ 0 is populated. In a highly occupied state with n 0 = n 0 , we can use the Bogoliubov approximation,n 0 ≈ n 0 , to derive the effective energy gap of theâ k -modes: We see that it is lowered, and for a critical occupation n 0 = N c all memory modes become gapless. Then the microstates corresponding to different occupation numbers of them become degenerate in energy and an entropy on the order of K is achieved. We call this phenomenon, in which a high occupation number of one mode decreases the energy gap of others, assisted gaplessness [35]. This is the reason why we choose N 0 = N c in the initial state (36). These observations lead to a question about how the memory modes backreact on the time evolution ofâ 0 . In order to study it, we coupleâ 0 to another modeb 0 , which results in the first line of the Hamiltonian (34). If the memory modesâ k are empty, the occupation numbers n 0 and m 0 perform oscillations. Once theâ k are occupied, however, the oscillations get suppressed and n 0 becomes tied to the initial state n 0 = N c . This effect of memory burden [36] arises because any deviation of n 0 from N c would destroy the gaplessness of the memory modes and hence be very costly in energy, as is evident from eq. (39).
In [31], we studied the question if memory burden can be overcome. To this end, we introduced a second set of memory modesâ k , which becomes gapless at a different occupation number n 0 = N c − ∆N c . The corresponding operators are displayed in the second line of the Hamiltonian (34). Moreover, we allowed transitions between and among the two sets of memory modes, which explains why we have the remaining lines of eq. (34). In the presence of a second set of memory modes, it becomes energetically possible to transit from the initial state (36), in which n 0 = N c and only theâ k are occupied, to a final state, in which n 0 = N c − ∆N c and onlyâ k are occupied. In [31], we used TimeEvolver to study if such transitions actually occur dynamically, and our conclusion was that they are generically heavily suppressed. Applied to black holes, this finding has interesting implications for Hawking radiation [37]. Namely, it indicates that evaporation of a black hole slows down significantly at the latest after it has lost half of its mass.

Analysis of Performance
Now we shall use TimeEvolver to compute the time evolution of the initial state (36) with the Hamiltonian (34). The corresponding code is included in the folder "example". The Hamiltonian is represented by the class exampleHamiltonian and the method main shows how TimeEvolver is applied. For the numerical precision on the norm we set err max = 10 −8 . As exemplary observables we use the expectation values of the number operators of all quantum modes.
For the choice (37) of parameters, the occupation numbers as functions of time are displayed in Fig. 1. Moreover, we evaluate the sums N 0 and N m of occupation numbers in the two sectors of the system, as defined in eq. (33). That N 0 and N m are indeed constant represents a first consistency check. As a second consistency check we compute a forward-backward example. To this end, we first calculate time evolution of the state (36) up to t = 10, as displayed in Fig. 1, and then use the result as initial state for another time evolution, in which we replaceĤ → −Ĥ. Fig. 2 shows the result of the second computation. In accordance with quantum mechanical unitarity, we see that Fig. 2 is the mirror image of Fig. 1. Moreover, we arrive again at the initial state with an error of 9.82 × 10 −9 which is within the requested error bound of 2.0 × 10 −8 .
Next, we shall analyze the runtime. All following benchmark results in this section were obtained on an Intel ® Core™i9-9900K Processor with 22GB RAM. First we compare the simulation wall time of the TimeEvolver with the original Expokit implementation (in its MATLAB version). 12 As a prototype system we choose (34) with the model parameters set to K = K = 10, N m = 5 and N 0 = N c = 100 and the remaining ones given by (37). This corresponds to a Hamiltonian matrix of dimension 1, 565, 904 × 1, 565, 904. We evolve the initial state (36) up to t = 10, choose for both implementations a Krylov space dimension of m = 40 and request an upper error bound of err max = 10 −7 . Restricted to one thread we measured the following mean simulation times averaged over five runs:   (37). The last point has been obtained by setting K = K = 10 and N 0 = N c = 139 to go to the memory limit of our specific machine. The blue straight line represents a fit to the data, as described in the text. The last point has not been included in the fit.
subsequently. Finally, we note that using the fast integration based on Gauß-quadrature does not lead to a measurable improvement in runtime due to a large Hilbert space dimension.
Our next goal is to study how runtime depends on the size of the system. Again using the Hamiltonian (34) as a prototype model, we shall vary the number K of modes in the memory sector. We set K = K as well as N m = K/2 and N 0 = N c = 100. Note that the amount of basis elements depends exponentially on the number of modes. The simulation time as a function of the Hilbert space dimension is illustrated in Fig. 3. The last data point corresponds to a Hilbert space dimension of 2, 170, 560 which was the largest one fitting in the memory of the benchmark machine. We note that the limitation on the Hilbert space dimension comes from the creation of the Hamiltonian matrix and not from the actual time evolution. By making matrix creation more memory efficient, one could slightly increase the maximal dimension of Hilbert space. Finally, we note that for our specific prototype system the relation of Hilbert space dimension d and simulation time t is polynomial. Performing a fit of the form t = a · d b to the data, we obtain a = 8.1 × 10 −6 and b = 1.23, where we excluded the last point due to overhead effects of being at memory limit. The result of the fit is shown in Fig. 3 as blue straight line.
To conclude, we study the dependence of the simulation time on meta parameters of the algorithm itself, namely the Krylov space dimension m and the requested tolerance on the norm error err max . For this study, we set K = K = 8,   N 0 = N c = 100, N m = 4 and t max = 10 with the remaining ones set by (37). The results are displayed in Fig. 4. It is evident from Fig. 4a that the runtime only depends mildly on the dimension of Krylov space. The range of acceptable values of m is usually rather large and penalty on the simulation time is in most cases not drastic as long as it is not unreasonably small or large. The wall-clock simulation time as a function of err max is shown in Fig. 4b. We observe a logarithmic dependence on the requested error bound.

Summary and Outlook
In this paper, we present the software package TimeEvolver. Its purpose it to compute the exponential of a large sparse matrix that is multiplied with a vector. We specialize to the case in which the matrix is anti-Hermitian. While this may seem like a peculiar choice from a mathematical point of view, it is of utter importance for physics: Any calculation of time evolution can be reduced to this task of exponentiating an anti-Hermitian matrix. In this case, the vector represents the initial state and the matrix is the imaginary unit i times the Hamiltonian.
It is well-known that an efficient way to tackle this task consists in Krylov subspace methods, and also TimeEvolver relies on them. In doing so, TimeEvolver goes beyond existing software packages, such as Expokit [1], in two ways. First, the specialization to anti-Hermitian matrices makes it possible to incorporate in the numerical implementation a rigorous formula for bounding the error of the Krylov approximation. Moreover, we provide an estimate for additional uncertainties due to numerical roundoff. An improved statement about the accuracy of the Krylov approach is of great importance for the potential of TimeEvolver to make new discoveries. If our software leads to novel and potentially unexpected results (as was the case in [31]), one needs to make sure that they are not artifacts of the numerical method. Since TimeEvolver is most useful for systems and phenomena that cannot be studied with any other means, the validity of new discoveries can only be established via an error analysis provided by the program itself.
The second advantage of TimeEvolver consists in its ease of use for applications to physics. In order to achieve this, we provide routines to create a basis of possible states and to derive the Hamiltonian matrix from a more abstract representation of the Hamiltonian. In doing so, we specialized to the widely-used case of Hamiltonians that are defined in terms of creation and annihilation operators, but it is straightforward to adapt our program to other operators. Moreover, we have included a concrete physical example, documented all parts of the software package and devised a streamlined installation and build procedure. Finally, our program is open-access and based on free software. In this way, we hope that TimeEvolver can contribute to progress across various disciplines in physics.
As an outlook, we would like to point out ideas for future improvements of TimeEvolver. First, it is known that a decrease of runtime for Krylov subspace methods can be achieved by employing GPU computing (see [38] for an early implementation using NVIDIA CUDA ). Secondly, parallelization is capable of bypassing the limitation on the size of the Hilbert space, which arises from the requirement of storing the Hamiltonian matrix in a single memory, by distributing it among many computing units. On a supercomputer, this makes it possible to study Hilbert spaces with dimension close to 10 10 [39]. It would be very interesting to combine a parallel implementation of Krylov subspace techniques, such as the one in [39], with the error analysis of TimeEvolver. 13

Declaration of Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.