Comparison of methods for calculating conditional expectations of sufficient statistics for continuous time Markov chains

Background Continuous time Markov chains (CTMCs) is a widely used model for describing the evolution of DNA sequences on the nucleotide, amino acid or codon level. The sufficient statistics for CTMCs are the time spent in a state and the number of changes between any two states. In applications past evolutionary events (exact times and types of changes) are unaccessible and the past must be inferred from DNA sequence data observed in the present. Results We describe and implement three algorithms for computing linear combinations of expected values of the sufficient statistics, conditioned on the end-points of the chain, and compare their performance with respect to accuracy and running time. The first algorithm is based on an eigenvalue decomposition of the rate matrix (EVD), the second on uniformization (UNI), and the third on integrals of matrix exponentials (EXPM). The implementation in R of the algorithms is available at http://www.birc.au.dk/~paula/. Conclusions We use two different models to analyze the accuracy and eight experiments to investigate the speed of the three algorithms. We find that they have similar accuracy and that EXPM is the slowest method. Furthermore we find that UNI is usually faster than EVD.


Background
In this paper we consider the problem of calculating the expected time spent in a state and the expected number of jumps between any two states in discretely observed continuous time Markov chains (CTMCs). The case where the CTMC is only recorded at discretely observed time points arises in molecular evolution where DNA sequence data is extracted at present day and past evolutionary events are missing. In this situation, efficient methods for calculating these types of expectations are needed. In particular, two classes of applications can be identified.
The first class of applications is concerned with rate matrix estimation. [1] describes how the expectationmaximization (EM) algorithm can be applied to estimate the rate matrix from DNA sequence data observed in the leaves of an evolutionary tree. The EM algorithm is implemented in the software XRate [2] and has been applied in [3] for estimating empirical codon rate matrices. [1] uses the eigenvalue decomposition of the rate matrix to calculate the expected time spent in a state and the expected number of jumps between states.
The second class of applications is concerned with understanding and testing various aspects of evolutionary trajectories. In [4] it is emphasized that analytical results for jump numbers are superior to simulation approaches and various applications of jump number statistics are provided, including a test for the hypothesis that a trait changed its state no more than once in its evolutionary history and a diagnostic tool to measure discrepancies between the data and the model. [4] assumes that the rate matrix is diagonalizable and that the eigenvalues are real, and applies a spectral representation of the transition probability matrix to obtain the expected number of state changes.
[5] and [6] describe a method, termed substitution mapping, for detecting coevolution of evolutionary traits, and a similar method is described in [7]. The substitution mapping method is based on the expected number of substitutions while [7] base their statistics on the expected time spent in a state. Furthermore [7] describes an application concerned with mapping synonymous and non-synonymous mutations on branches of a phylogenetic tree and employs the expected number of changes between any two states for this purpose. [8] uses the expected number of state changes to calculate certain labeled evolutionary distances. A labeled evolutionary distance could for example be the number of state changes from or to a specific nucleotide. In [9] substitution mapping is invoked for identifying biochemically constrained sites. In [7] and [8] the summary statistics are calculated using the eigenvalue decomposition method suggested by [1]. In [5,6] and [9] the substitution mapping is achieved using a more direct formula for calculating the number of state changes. In this direct approach an infinite sum must be truncated and it is difficult to control the error associated with the truncation. An alternative is described in [10] where uniformization is applied to obtain the expected number of jumps. [10] uses the expected number of jumps on a branch to detect lineages in a phylogenetic tree that are under selection.
A third algorithm for obtaining the number of changes or time spent in a state is outlined in [11]. The algorithm is based on [12] where a method for calculating integrals of matrix exponentials is described. A natural question arises: which of the three methods (eigenvalue decomposition, uniformization or matrix exponentiation) for calculating conditional expectations of summary statistics for a discretely observed CTMC should be preferred? The aim of this paper is to provide an answer to this question. We describe and compare the three methods. Our implementations in R [13] are available at http://www.birc.au.dk/~paula/. (Furthermore the eigenvalue decomposition and uniformization methods are also available as a C++ class in the bio++ library at http://biopp.univ-montp2.fr/.) The performance and discussion of the algorithms are centered around two applications. The first application is concerned with rate matrix estimation; we estimate the Goldman-Yang codon model [14] using the expectation-maximization algorithm. The second application is based on the labeled distance estimation presented in [8].
Consider a stochastic process {X(s): 0 ≤ s ≤ t} which can be described by a CTMC with n states and an n × n rate matrix Q = (q cd ). The off-diagonal entries in Q are non-negative and rows sum to zero, i.e. q cc = -Σ d≠c q cd = -q c . Maximum likelihood estimation of the rate matrix from a complete observation of the process is straight forward. The likelihood of the process, conditional on the beginning state X(0), is given by (e.g. [15]) where T c is the total time spent in state c and N cd is the number of jumps from c to d. The necessary sufficient statistics for a CTMC are thus the time spent in each state and the number of jumps between any two states. In applications, however, access is limited to DNA data from extant species. The CTMC is discretely observed and we must estimate the mean values of T c and N cd conditional on the end-points X(0) = a and X(t) = b. From [15] we have that where P(t) = (p ij (t)) = e Qt is the transition probability matrix and Many applications require a linear combination of certain substitutions or times. Examples include the number of transitions, transversions, synonymous and non-synonymous substitutions. In the two applications described below the statistics of interest is a linear combination of certain substitutions and times. Let therefore C be an n × n matrix and denote by Σ(C; t) the matrix with entries We describe, compare and discuss three methods for calculating Σ(C; t). The evaluation of the integrals (4) takes O(n 3 ) time and therefore a naive calculation, assuming that C contains just one entry different from zero has a O(n 5 ) running time. Even worse, if C contains O(n 2 ) entries different from zero, then the naive implementation has a O(n 7 ) running time. For all three methods our implementations of Σ(C; t) run in O(n 3 ) time.

Application 1: Rate matrix estimation
Our first application is the problem of estimating the parameters in a CTMC for evolution of coding DNA sequences which we describe using the 61 × 61 rate matrix (excluding stop codons) given by Goldman and Yang [14]: if there is more than one difference between codons i and j ακπj if j is obtained from i by a synonymous transition απj if j is obtained from i by a synonymous transversion αωκπj if j is obtained from i by a non -synonymous transition αωπj if j is obtained from i by a non -synonymous transversion (6) where π is the stationary distribution, is the transition/transversion rate ratio, ω is the non-synonymous/ synonymous ratio and a is a scaling factor. The stationary distribution π is determined directly from the data using the codon frequencies. We estimate the remaining parameters θ = (a, , ω) using the expectation-maximization (EM) algorithm [16] as described below.
Suppose the complete data x is available, consisting of times and types of substitutions in all sites and in all branches of the tree. The complete data log likelihood is, using (1) and (6), (α, κ, ω; x) = −αL s,tv − αωL ns,tv − ακL s,ts − ακωL ns,ts + N log α + N ts log κ + N ns log ω, where we use the notation where e.g.
Ls,ts = {(i, j) : i and j differ at one position and the substitution of i with j is a synonymous transition}.
A similar notation applies for L s,tv , L ns,ts , L ns , tv , N ns and N, where the last statistic is the sum of substitutions between all states (i, j) that differ at one position and s, ns, ts and tv subscripts stand for synonymous, non-synonymous, transition and transversion.
The complete data log likelihood can be maximized easily by making the re-parametrization b = a. We find that α = N tv L s,tv +ωL ns,tv ,β = N ts L s,ts +ωL ns,ts where a = -L ns,tv L ns,ts N s , b = L ns,tv L s,ts (N ns -N tv ) + L ns, ts L s,tv (N ns -N ts ) and c = L s,tv L s,ts N ns . In reality the data y is only available in the leaves and the times and types of substitutions in all sites and all branches of the tree are unaccessible. The EM algorithm is an efficient tool for maximum likelihood estimation in problems where the complete data log likelihood is analytically tractable but full information about the data is missing.
The EM algorithm is an iterative procedure consisting of two steps. In the E-step the expected complete log likelihood conditional on the data y and the current estimate of the parameters θ 0 is calculated. In the M-step the parameters are updated by maximizing G(θ; θ 0 ,y). The parameters converge to a local maximum of the likelihood for the observed data.
The expected log likelihood conditional on the data y and under the three parameters a, and ω is Therefore the E-step requires expectations of linear combinations of waiting times in a set of states and number of jumps between certain states. Because of the Markov property this calculation can be divided in two parts. First we use the peeling algorithm [17,18] to obtain the probability P(γ k = a, β k = b|y, t k ) that a branch k of length t k with nodes g k and b k above and below the branch, respectively, has end-points a and b. Second, we calculate the desired summary statistic by summing over all branches. For example we have The E-step thus consists of calculating conditional expectations of linear combinations of times such as E[L s,ts |t k , a, b] and substitutions such as E[N ts |t k , a, b] where L s,ts and N ts are given by (8). In our application n = 61 and the first type of statistics E[L s,ts |t, a, b] is (up to a factor p ab (t)) on the form (5) with diagonal entries C ii = j π j 1((i, j) ∈ L s,ts ) and all off diagonal entries equal to zero. The second type of statistics E[N ts |t, a, b] is also on the form (5) with off-diagonal entries C ij = q ij 1((i, j) ∈ L ts ) and zeros on the diagonal.

Application 2: Robust distance estimation
The second application is a new approach for estimating labeled evolutionary distance, entitled robust counting and introduced in [8]. The purpose is to calculate a distance that is robust to model misspecification. The method is applied to labeled distances, for example, the synonymous distance between two coding DNA sequences. As it is believed that selection mainly acts at the protein level, synonymous substitutions are neutral and phylogenies built on these type of distances are more likely to reveal the true evolutionary history. The distance is calculated using the mean numbers of labeled substitutions conditioned on pairwise site patterns averaged over the empirical distribution of site patterns observed in the data. In the conventional method the average is done over the theoretical distribution of site patterns. The robustness is therefore achieved through the usage of more information from the data and less from the model.
Let Q be the rate matrix of the assumed model, P(t) = e Qt , the labeling be given through a set of pairs L and the data be represented by a pairwise alignment y = (y 1 , y 2 ) of length m. As data only contains information about the product Qt, where t is the time distance between the sequences, we can set t = 1.
Suppose we observe the complete data consisting of the types of substitutions that occurred in all sites and The labeled distance is estimated as the average across all sites of the expected number of labeled substitutions conditioned on the observed end points: Therefore this application requires evaluating a sum on the form (5) with off-diagonal entries C ij = q ij 1((i, j) ∈ L) and zeros on the diagonal.

Algorithms
The calculation of Σ(C; t) is based on the integrals I ab cd (t). In this section we present three existing methods for obtaining the integrals and extend them to obtain Σ(C; t).

Eigenvalue decomposition (EVD)
When the rate matrix Q is diagonalizable, the computation of transition probabilities p ab (t) and integrals I ab cd (t) can be done via the eigenvalue decomposition (EVD). EVD is a widely used method for calculating matrix exponentials. Let Q = UΛU -1 be the eigenvalue decomposition, with Λ = diag(λ 1 , ..., λ n ). It follows that The integral (4) becomes where Replacing I ab cd (t) with (16) in (5), rearranging the sums and using where ○ represents the entry-wise product. The eigenvalues and eigenvectors might be complex, but they come in complex conjugate pairs and the final result is always real; for more information we refer to the Supplementary Information in [2]. If the CTMC is reversible, the decomposition can be done on a symmetric matrix obtained from Q (e.g. [15]), which is faster and tends to be more robust. Let π be the stationary distribution. Due to reversibility, π a q ab = π b q ba , which can be written as We have that where S* is the transpose of S. Then S is symmetric. Let Λ, V be its eigenvalues and eigenvectors, respectively. Then VΛV -1 = S = Π 1/2 QΠ -1/2 , which implies Q = (Π -1/2 V)Λ(V -1 Π 1/2 ) and it follows that Q has the same eigenvalues as S and Π -1/2 V for eigenvectors.
The results can be summarized in the following algorithm: Step 1: Determine eigenvalues l i .
Determine the eigenvectors U i for Q and compute U -1 .

Uniformization (UNI)
The uniformization method was first introduced in [19] for computing the matrix exponential P(t) = e Qt . In [11] it was shown how this method can be used for calculating summary statistics, even for statistics that cannot be written in integral form. Let μ = max i (q i ) and Then (20)  where Pois(m; l) is the probability of m occurrences from a Poisson distribution with mean l. Using (20) we also have Replacing (21) in (5), rearranging the sums and using The main challenge with this method is the infinite sum and we use (20) to determine a truncation point. In particular if we let l = μt and truncate at s(l) we can bound the error using the tail of the Poisson distribution: Pois(m; μt).
We have that, for large values of l, Pois(λ) ≈ N(λ, λ), where N(μ, σ 2 ) is the normal distribution with mean μ and variance s 2 . Therefore, for large l, the error bound where Φ(·) is the cumulative distribution function for the standard normal distribution. Consequently we can approximate the truncation point s(l) with Another way to determine s(l) is to use R to evaluate Pois(m; l) for values of m that gradually increase, until the tail is at most b = 10 -8 . Combining these two approaches, we performed a linear regression, approximating the tails from R by c 1 + c 2 √ λ + c 3 λ . We obtained c 1 = 4.0731, c 2 = 5.6469, c 3 = 0.9963 but, in order to be conservative, we use s(λ) = 4 + 6 √ λ + λ where ⌈x⌉ is the smallest integer greater than or equal to x. In Figure  1 we compare the exact truncation value and the linear regression approximation.
The linear regression provides an excellent fit to the tail of the distribution.

Exponentiation (EXPM)
This method for calculating the integral (4) was developed in [12] and emphasized in [11]. Suppose we want to evaluate t 0 e Qu Be Q(t−u) du, where Q and B are n × n matrices. To calculate this integral, we use an auxiliary matrix A = Q B 0 Q and the desired integral can be We are interested in where 1 {(c,d)} is a matrix with 1 in entry (c, d) and zero otherwise. We can use this method to determine I ab cd (t) by simply setting B = 1 {(c,d)}, construct the auxiliary matrix A, calculate the matrix exponential of At, and finally read off the integral in entry (a, b) in the upper right corner of the matrix exponential.
Replacing (24) in (5) and rearranging the terms we have (25) Therefore by setting B = C in the auxiliary matrix we obtain Σ(C;t).
Step 2: Calculate the matrix exponential e At .
Step 3: Σ(C; t) is the upper right corner of the matrix exponential.

Testing
We implemented the presented algorithms in R and tested them with respect to accuracy and speed.

Accuracy
The accuracy of the methods depends on the size of the rate matrix and the time t. To investigate how these factors influence the result, we used two different CTMCs that allow an analytical expression for (4). The first investigation is based on the Jukes-Cantor model where the rate matrix has uniform rates and variable size n: Q has two unique eigenvalues: 0 with multiplicity 1 and − n n − 1 with multiplicity n-1. We obtain We compared the result from all three methods against the true value of (5) for size n ranging from 5 to 100, t = 0.1 and random binary matrices C. Entries in C are 1 with probability 1 2 . For each fixed size, we generated 5 different matrices C. The average normalized deviation is shown in Figure 2.
The second CTMC is the HKY model: w h e r eπR = πA + πG and πY = πC + πT, From this, using the symbolic operations in Matlab [20], we obtained the final analytic expression for (4). Using this model we compared for all three methods the true value of (5) for various values of t and randomly generated binary matrices C. For each t we generated 5 different matrices C. The average normalized deviation is shown in Figure 2.
In both cases, all methods showed good accuracy as the normalized deviation was no bigger than 3 × 10 -9 . We also note that EXPM tended to be the most precise while UNI provided the worst approximation. To further investigate the accuracy, we performed calculations on randomly generated reversible rate matrices: we first obtained the stationary distribution from the Dirichlet distribution with shape parameters equal to 1, then all entries q ij with i ≥ j from the exponential distribution with parameter 1 and finally calculated the remaining entries using the reversibility property. In all the runs the relative difference between EVD, UNI and EXPM was less than 10 -5 . This indicated that all three methods have a similar performance in a wide range of applications.

Speed
Partition of computation Assume we need to evaluate Σ(C; t) for a fixed matrix C and multiple time points t {t 1 ,...t k }. In each iteration of the EM-algorithm in Application 1 we need this type of calculation while in order to calculate the labeled distance in Application 2 just one time point is required. Using EVD (Algorithm 1) we do the eigenvalue decomposition (Step 1) once and then, for each time point t i , we apply Step 2 and Step 3. The eigenvalue decomposition, achieved through the R function eigen, has a running time of O(n 3 ). In Step 2 we determine J(t) and this takes O(n 2 ) time.
Step 3 has a running time of O(n 3 ) due to the matrix multiplications.
If instead we apply UNI (Algorithm 2), we run Steps 1-3 for the largest time point max(t i ) and then, for each time point t i , we apply In the case of EXPM (Algorithm 3) we need to calculate the matrix exponential at every single time point. We used the expm R package [21] with the Higham08 method. This is a Padé approximation combined with an improved scaling and squaring [22] and balancing [23]. The running time is O(n 3 ). Table 1 provides an overview of the running times for each of the methods. The algorithms are divided into precomputation and main computation where the precomputation consists of steps that must be executed only once, while in the main computation we calculate the value of Σ(C;t) for every time point under consideration. Experiments We tested the speed of the algorithms in six experiments based on the presented applications and two more experiments using a non-reversible matrix. GY The first experiment corresponded to running the EM algorithm on real data consisting of DNA sequences from the HIV pol gene described in [24]. HIV has been extensively studied with respect to selection pressure and drug resistance and in [24] the authors document convergent evolution in pol gene caused by drug resistance mutations. The observed data y was a multiple codon alignment of the sequences. For simplicity, we did not consider the columns with gaps or ambiguous nucleotides. To compare the performance of the methods as a function of the size of the data set, we applied the EM algorithm for 15 data sets containing from 2 up to 16 sequences each, extracted from the HIV pol gene data. For each set we assumed the sequences were related according to a fixed tree; we have reconstructed the phylogenies in Mega [25] using the Jukes-Cantor model and Neighbor-Joining. We ran the EM algorithm until all three parameters converged. Experiments two and three used the previously estimated matrix Q given by (6) with a = 10.5, = 4.27 and ω = 0.6. We let C ij = q ij and C ii = 0, corresponding to calculating the total number of expected substitutions E[N|t, a, b], and computed the value of Σ(C; t k ) for 10 equidistant sorted time points t k with 1 ≤ k ≤ 10 ( Table 2). GTR In the fourth experiment we estimated the robust labeled distance of two sequences, using the same setup as in [8]. For each considered evolutionary distance t between 0.1 and 1, we generated 50 pairwise sequence data sets of length 2000 which have evolved for time t under the general time reversible (GTR) model with · r 1 π G r 2 π C r 3 π T r 1 π A · r 4 π C r 5 π T r 2 π A r 4 π G · r 6 π T r 3 π A r 5 π G r 6 π C · ⎞ ⎟ ⎟ ⎠ where r = (0.5, 0.3,0.6, 0.2,0.3, 0.2) and π = (0.2, 0.2,0.3, 0.3). For labeling, we considered the jumps to and from nucleotide A, leading to C ij = q ij if i or j represents nucleotide A. For each data set, we estimated the GTR parameters as described in [8] and calculated the robust distance. Experiments 5 and 6 used the same GTR matrix and C ij = q ij if i or j represents nucleotide A and zero otherwise, and computed the value of Σ(C;t k ) for 10 equidistant sorted time points t k with 1 ≤ k ≤ 10 ( Table 2). UNR In the last two experiments we used the same setup as in experiments 5 and 6 but with a different matrix  The table shows the time points t k , μ tk and the approximation of the Poission tail s(μt k ). For experiment 2, t k spanned the interval that contains the 10 longest branch lengths from the phylogeny of the 16 HIV pol sequences. In experiment 3 we started at 0.1 and ended at 1. We wished to design experiment 5 such that the corresponding s(μt k ) was the same as s(μt k ) from experiment 3. This allowed us to illustrate the relative performance of the methods when running on different sizes of the rate matrix. Experiment 6 was done on time points starting from 0.1 and ending at 4.6. As before, we wished to design experiment 7 such s(μt k ) corresponded to experiment 6. Experiment 8 used the same t k as experiment 6. and time points ( Table 2). As the speed of EVD is influenced by the type of the model, we decided to employ a non-reversible matrix. We chose the unrestricted model and carefully set the rates such that the matrix has a complex decomposition: Then, at position k, we plot the cumulative running time for precomputation and the evaluation of Σ(C;t i ) for all i ≤ k. Since EVD and EXPM have running times that are independent of t k , the running times for these two algorithms are the same in experiments 2 and 3, 5 and 6, and 7 and 8. Even more, as EXPM is dependent only on the size of the matrix, the running times in experiments 5-8 are the same. We observe that in all our experiments EXPM is the slowest method. Deciding if EVD or UNI is faster depends on the size and type of the matrix, the number of time points and the values of s(μt). As the main computation for UNI has a running time of O(n 2 ) as opposed to O(n 3 ) for EVD (Table 1), this method should have an increased advantage when the rate matrix is bigger. This means that if many time points are considered, then UNI is generally the faster method. Importantly, we note that the EVD precomputation tends to be faster than the UNI precomputation. We remark that, in the first experiment, UNI proved to be the fastest method while, in the fourth experiment, UNI became slower with the increase of the evolutionary distance between the sequences and it was only faster than EVD for small distances (< 0.2). By setting t k in an appropriate manner (Table 2), we have the same running time for UNI and EXPM for experiment 7 compared to experiment 6. Due to the fact that in experiment 7 we used the UNR matrix, EVD is slower as opposed to experiment 6. In this case, the difference is observable but not very big, but as the size of the matrix increases, this discrepancy increases too. We also note that the difference between the reversible and nonreversible cases is enough to make UNI faster than EVD in the latter case.

Discussion
The EVD algorithm assumes that the rate matrix is diagonalizable. However, a direct calculation of the integral (4) in the non-diagonalizable case is actually possible using the Jordan normal form for the rate matrix. Let Q = PJP -1 where J is the Jordan normal form of Q and P consists of the generalized eigenvectors (we recognize that we used P and J for other quantities earlier but for this discussion this should not cause any confusion and we prefer to use standard notation), i.e. J has a block diagonal form J = diag(J 1 ,..., J ) where J k = l k I + N is a matrix with l k on the diagonal and 1 on the superdiagonal. We have and noting that N is a nilpotent matrix with degree d k (equal to the size of block J k ) we obtain e Jkt = e tλk e tN = e λkt I + tN + t 2 2 In order to calculate the integral (4) the expressions (26) and (27) are used. It is evident that this procedure is feasible but also requires much bookkeeping.
In [26] an extension of uniformization, adaptive uniformization, is described for calculating transition probabilities in a CTMC. The basic idea is to perform a local uniformization instead of a global uniformization of the rate matrix and thereby have fewer jumps in the jump process. [26] considers a model with rate matrix (state 4 is an absorbing state). If this process starts in state 1 then the first jump is to state 2 and the second is from state 2 to either state 1 or state 3. This feature can be taken into account by having a so-called adaptive uniformized (AU) jump process where the rate for the first jump is 3ν, for the second is μ + 2ν and, assuming μ + ν > 3ν, the rate for the third jump is μ + ν. From the third jump the rate in the AU jump process is μ + 2ν as in the standard uniformized jump process. The AU jump process has a closed-form expression for the jump probabilities (it is a pure birth process) but is of course more complicated than a Poisson jump process. The advantage is that the AU jump process exhibits fewer jumps. This procedure could very well be useful for codon models where the set of states that the process can be in after one or two jumps are limited because only one nucleotide change is allowed in each state change.
In an application concerned with modeling among-site rate variation, [27] applies the uniformization procedure (20) to calculate the transition probabilities instead of the eigenvalue decomposition method (15). [27] shows, in agreement with our results, that uniformization is a faster computational method than eigenvalue decomposition.
The presented methods are not the only ones for calculating the desired summary statistics. For example, in [5] it is suggested to determine the expected number of jumps from the direct calculation p ab (t)E[N cd |t, a, b] = ∫ t 0 (e Qs ) ac q cd (e Q(t−s) ) ac ds where the infinite sum is truncated at k = 10. The problem with this approach is that it is difficult to bound the error introduced by the truncation. In UNI a similar type of calculation applies but the truncation error can be controlled.

Conclusion
Recall that EVD assumes that the rate matrix is diagonalizable and this constraint means that EVD is less general than the other two algorithms. We have shown in the Discussion how a direct calculation of the integral (4) is actually still possible but requires much bookkeeping. On top of being less general, EVD is dependent on the type of the matrix: reversible or non-reversible. We have shown how this discrepancy can make EVD slower than UNI even when the state space has size of only 4. We found that the presented methods have similar accuracy and EXPM is the most accurate one. With respect to running time, it is not straightforward which method is best. We found that both the eigenvalue decomposition (EVD) and uniformization (UNI) are faster than the matrix exponentiation method (EXPM). The main reason for EVD and UNI being faster is that they can be decomposed into a precomputation and a main computation. The precomputation only depends on the rate matrix for EVD while for UNI it also depends on the largest time point and the matrix C. We also remark that EXPM involves the exponentiation of a matrix double in size. UNI is particularly fast when the product μt is small because in this case only a few terms in the sum (22) are needed.