Deriving Transition Probabilities and Evolutionary Distances from Substitution Rate Matrix by Probability Reasoning

Substitution rate matrices are used to correct multiple hits at the same sites, which requires the derivation of transition probabilities and evolutionary distances from substitution rate matrices. The derivation is essential in molecular phylogenetics and phylogenomics, and represents the only statistically sound way for developing scoring matrices used in sequence alignment and local string matching (e.g., BLAST and FASTA). Three different approaches are frequently used for deriving transition probabilities and evolutionary distances: 1) The probability reasoning, 2) Solving partial differential equations, and 3) Matrix exponential and logarithm. The first approach demands the least amount of mathematical skills but offers the best way for conceptual understanding, and can often generate nice mathematical expressions of transition probabilities and evolutionary distances. This review represents the most systematic and comprehensive numerical illustration of the first approach.

Transition probability matrix, often referred to as the P matrix, specifies the probability of a nucleotide or amino acid changing into another one after time t.It is needed to calculate likelihood and to derive evolutionary distances, and consequently is needed phylogenetics based on the maximum likelihood and distance-based methods as well as Bayesian inference.Whether a substitution model can be implemented for phylogenetic analysis essentially depends on whether the model's transition probabilities can be calculated.
There are three ways to obtain transition probabilities from the Q matrix [1,2] :1) By probability reasoning, 2) By solving differential equations involving rates, and 3) By taking the matrix exponential of the rate matrix.The last two

Introduction
Substitutions occur over time and can overwrite each other at the same nucleotide or amino acid site.When we compare two homologous nucleotide sequences and find differences in N sites, the actual number of substitutions (designated by M) could be much greater than N because multiple substitutions could have happened at the same site, overwriting each other.Substitution models are used to infer the observable M from the observed substitutions from sequence comparisons.There are some quick ways to check the derived transition probabilities.First, we note that when t approaches infinity, then all entries in matrix P approaches ¼ if α > 0. This is what we have expected.Second, when t = 0, then all diagonal elements in matrix P are 1 and all off-diagonal elements are zero.This is again what we expected.Third, if α is zero, then no change is possible, and we again expect all diagonal elements in matrix P to be 1 and all off-diagonal elements to be zero, which is also true.

Many substitution models have been proposed for
From p ij in Eq. ( 2), the expected proportion of sites that are different between two aligned homologous sequences (p diff ) is 3 * p ij (t), i.e., Note that p diff approaches ¾ when t is infinitely large, which means that multiple substitutions can no longer be corrected.Eq. ( 4) offers another way of deriving p ii (t) in Eq. (3), i.e., p ii (t) is simply 1-p diff .
Eq. ( 4) allows us to derive the JC69 distance (D JC69 ) because a distance is defined as μt where μ is the substitution rate which is equal to 3α in the JC69 model.This is the same as the distance that you have driven is the product of the speed (rate) and time.Given that D JC69 = 3αt, we can derive D JC69 (Figure 1g) by substituting αt = D JC69 /3 into Eq.( 4), i.e., 69 4 3 ln 1 4 3 Where p diff (the expected number of sites that are different between the two homologous sequences) can be approximated by the observed proportion of sites (p diff.obs ) differing between the two aligned sequences.Note that p diff.obs may differ from p diff even when the underlying substitution model indeed follows JC69 because of 1) Stochastic factors due to limited aligned length of the two sequences, and 2) Distortion caused by suboptimal sequence alignment.Thus, although p diff in Eq. ( 4) cannot be greater than 0.75, p diff.obs could, even when sequences evolve strictly according to the JC69 model.D JC69 is not defined when p diff ≥ 0.75 as there is no logarithm for 0 or negative values.
We can optionally show that D JC69 in Eq. ( 5) is a maximum likelihood distance.For two aligned sequences of length N, designate the number of sites that differ between the two sequences as N D and the number of sites identical between the two sites as (N-N D ).Now the likelihood function is: tual understanding of the substitution models, but also to deriving nice mathematical expressions for transition probabilities and evolutionary distances.New researchers often ask why we can derive evolutionary distances between two aligned sequences for the TN93 model [3] but cannot for the simpler HKY85 model [4] which is a special case of the TN93 model, yet another model, F84 (used in PHYLIP since 1984), which is also a special case of the TN93 model, can have its evolutionary distance readily derived.One can easily obtain answers to such questions by taking the first approach.However, the first approach is not of general purpose and cannot handle very complicated substitution models.In contrast, the last one can be used with any substitution models specified by a rate matrix from which the matrix exponential can be obtained.In short, all these approaches need to be learned by anyone wishing to become a molecular phylogeneticist, but this paper will focus only on the probability reasoning approach illustrated with JC69 [5], K80 [6], F84 (the model used in PHYLIP since 1984), HKY85 [4], and TN93 [3] models.

Probability Reasoning to Obtain Transition Probabilities and Evolutionary Distances
Felsenstein [1] presented nice examples of probability reasoning to derive transition probabilities and evolutionary distances from rate matrices.This section presents the approach in a more systematic and accessible way.

JC69 model
Consider nucleotide A in the JC69 model (Figure 1a).Imagine that the nucleotide has a rate α of changing into any of the four nucleotides, i.e., including changing to itself (Figure 1b).This is effectively the same specification as the JC69 model.After time t, the expected number of substitutions is 4αt and the probability of no substitution is p (x = 0, α, t) = e -4αt according to the Poisson distribution, and the probability of having at least one change is then p (x ≥ 1, α, t) = 1-e -4αt (Figure 1c).Because nucleotide A can change into any one of the four nucleotides (including nucleotide A itself), each nucleotide gets 1/4 of p (x ≥ 1, α, t).We therefore have in Figure 1.
The transition probability p ii (t) is the summation of two probabilities: the probability of no change (which is e -4αt ) and the probability of changing to itself which is the same as specified in Eq. ( 2), as shown in Figure 1e, i.e., 4 4 ( 1, , ) The transition probability matrix for the JC69 model has only two distinct elements.The four diagonal elements are the same as specified in Eq. ( 3) and all the off-diagonal elements are the same as specified in Eq. (2).Each row in P adds up to 1 as a nucleotide can either stay the same or change into some other nucleotides.The equilibrium frequencies of the π vector can be derived by set t = ∞ in Eqs.(2) and (3) which leads to p ii = p ij = ¼.This implies that equilibrium frequencies of the four nucleotides will be equal for the JC69 model.This is not surprising because the frequencies did not even appear in the rate matrix (Figure 1a).

K80 model
The K80 model has a transition substitution rate α and a transversion rate β (Figure 3a).We will focus on nucleotide A and conceptualize the model with two events (Figure 3b), in contrast to only one event in the JC69 model.The first event (e 1 ) occurs when nucleotide A changes into any of the four nucleotides (including to itself).In other words, the original A is replace by a nucleotide randomly drawn from a nucleotide pool with equal nucleotide frequencies.This event occurs with a rate β.The second event (e 2 ) occurs when nucleotide A changes either to G or to itself, i.e., the original A is replace by a nucleotide randomly drawn from a purine pool with equal number of A and G.This e 2 occurs with a rate γ.Thus the transition rate α equals β+γ according to this conceptualization.Note that, whenever e 1 happens, the original nucleotide is replace by any one of the four nucleotides with equal probability, no matter how many e 2 events has occur before or after the occurrence of e 1 .It might help to think of a long sequence with L sites Where the constant term N * ln(1/4) can be dropped in maximizing lnL to obtain the distance estimate, but needs to be kept when performing likelihood ratio test for comparing different substitution models (e.g., JC69 against TN93).
We take the derivative of lnL with respect to D JC69 , set the derivative to 0 and solve for D JC69 .The resulting D JC69 is exactly the same as that in Eq. ( 5).I used D instead of D JC69 in the equations below: The variance of D JC69 (designated as V JC69 ) is obtained as the negative reciprocal of the second derivative of lnL: Note that V JC69 decreases with sequence length L as  being A at time 0. If these L sites each have experienced at least one e 2 event, then these sites will either be A or G with equal probability (i.e., 0.5), and we expect to have L/2 sites being A and the other L/2 sites being G.In contrast, if each of these L sites has experienced at least one e 1 event, then the site will be replaced by either A, C, G, or T with equal probability, and we expect to observe A, C, G and T in L/4 sites each.Any e 2 events occurring before or after the e 1 event do not change this expectation.This means that e 1 erases e 2 , but not vice versa.The probability that an e 2 event happened is informative only when no e 1 event has happened.
After time t, the expected number of substitutions is 2(α+β)t, i.e., the nucleotide A has two ways of change with a rate of α (to A and to G) and another two ways of change with a rate β (to C and to T), so the probability of no change, according to Poisson distribution, is Note that α is conceptualized as (β+γ) in Figure 3, so e -2(α+β)t in Eq. ( 9) is equivalent to e -(4β+2γ)t .The probability that at least one e 1 event has occurred is Thus, the probability that at least one e 2 has occurred but e 1 has not occurred is simply These probabilities are also shown in Figure 3c.The reason for the condition that "e 1 has not occurred" is because e 1 event can erase e 2 event (as we have discussed before).Now the probability of the starting nucleotide A changing to G during time t, designated as p(G|A,t), is the summation of two probabilities.The first is 1/2 of the probability of p(e 2 > 0, e 1 = 0,t) in Eq. ( 11) because the other 1/2 is for A to itself.The second is 1/4 of p(e 1 > 0,t) in Eq. ( 10) because A→A, A→G, A→C and A→T each get ¼, so only 1/4 of p(e 1 > 0,t) is for A→G.The summation of these two probabilities (Figure 3d) is p(G|A,t).This probability is equal to p(A|G,t), p(C|T,t), and p(T|C,t) in the K80 model.In other words, the summation of these two probabilities is the probability of a transition (P s ) during time t.Thus, Similarly, the probability of the starting A changing to C (or to T) is 1/4 of p(e 1 > 0,t) in Eq. ( 10) because 1/4 is for A→A, ¼ is for A→G and 1/4 is for A→T, so only 1/4 is for A→C (Figure 3d).This probability is the probability for a transversional change during time t, As a quick check of the derived transition probabil- tionary distances based on a substitution model).Methods for handling such situations are discussed later in the section on the GTR model.
We may optionally show D K80 in Eq. ( 17) to be a maximum likelihood estimator of the distance based on the K80 model, just like the K JC69 distance in Eq. ( 5).To see this, it is better to re-parameterize the K80 model by replacing αt and βt by D K80 and κ using the following relationship: Solving these two equations gives us Substituting αt and βt into Eqs.( 14) and ( 15) so that P and Q will be functions of D K80 and κ, and the likelihood function for deriving D K80 and κ is Where the constant term N * ln(1/4) can be dropped in maximizing lnL to obtain the distance estimate, but need to be kept when performing likelihood ratio test for comparing different substitution models (e.g., K80 against TN93).
Taking partial derivatives with respect to D K80 and κ, setting them to zero and solving the simultaneous equations, we have ) Where P obs = N s /N, and Q obs = N v /N.Using the two aligned sequences in Figure 2, we have N = 24 and P obs = 4/24 and Q obs = 2/24.These lead to D K80 = 0.3151, and κ = 4.9126.It may be relevant to add that, while D JC69 and D K80 are maximum likelihood estimates, distance formulae for F84 and TN93 models, obtained in the same way by equating the observed substitutions to expected substitutions, are generally not maximum likelihood estimates.This will become clear when we deal with these models.
We have previously derived the variance of D JC69 as the negative reciprocal of the second derivative of lnL with respect to D JC69 .This can be used only when the log-likelihood function is used to estimate a single parameter.When there are multiple parameters (e.g., D K80 and κ), we cannot use the same approach unless the parameters are not correlated.There are two common-ities, we note that P s and P v are zero when t = 0 (or when = 0 and β = 0).This also implies that all diagonal elements in the transition probability matrix are equal to 1, and is what we have expected.When t = ∞, with α > 0 and β > 0, all entries in matrix P approaches ¼ (the equilibrium frequency of the K80 model).This is also what we expected.
For two aligned homologous sequences, P s can be approximated by the proportion of sites differing by a transition (P), and 2P v by the portion of sites differing by a transversion (Q, Figure 3e).Note that the expected Q is equal to 2P v because there are two ways of having a transversional change.Therefore, We can now first solve for βt from Eq. ( 15), and then substitute the solution for βt into Eq.( 14) to solve for αt.This leads to ln(1 2) ln(1 2 ) Recall that evolutionary distance is defined as µt, where µ is the substitution rate which is equal to (α+2β) in the K80 model.Thus, the evolutionary distance based on the K80 model (D K80 ) is (α+2β)t, which comes to Where P and Q can be approximated by the observed proportion of sites differing by a transition or a transversion from two aligned sequences, designated as P obs and Q obs .Similar to what I have mentioned with reference to D JC69 , P and Q may differ from P obs and Q obs even if the K80 model is followed during the sequence evolution.This is because 1) Limited aligned length of the two sequences may result in stochastic variation in P obs and Q obs , and 2) The two observed proportions may be distorted by alignment errors (i.e., misidentification of site homology).For example, two homologous sequences that have diverged for an infinite length of time according to the K80 model should have expected P and Q equal to 0.25 and 0.5, respectively.However, we may actually have P obs > 0.25 or Q obs > 0.5, which would render D K80 inapplicable.On the other hand, after sequence alignment and deletion of indels (because evolutionary distances are typically calculated without using sites with indels), P obs and Q obs may well be much smaller than the expected 0.25 and 0.5 leading to severer underestimation of the true distance.It is also possible to have P obs and Q obs values that, when used to replace P and Q in Eq. ( 16), result in negative αt or βt values that make no biological sense.The same applies to D JC69 (in fact to any evolu-ISSN: 2378-3648 Where π A , π G , π C and π T are equilibrium frequencies, π R and π Y are frequencies of purines and pyrimidines, and the diagonal elements are constrained by each row summing up to 0. The parameter γ in Eq. ( 26) is sometimes replaced by κβ, but it is easier to understand the F84 model by using Q F84 specified in Eq. (26).
We may view the F84 model as featuring two events (e 1 and e 2 ).Suppose we start with a nucleotide A. Event e 1 occurs with rate β.When it occurs, the original A will be replaced by a nucleotide drawn randomly from a nucleotide pool in which the nucleotide frequencies are the same as the equilibrium frequencies.This means that the original A has a rate of βπ A, βπ G, βπ C and βπ T to change to A, G, C and T, respectively, when e 1 occurs.This is different from the K80 model where, when e 1 occurs, the original A has a rate of 0.25 to change to any of the four nucleotides.Event e 2 has a rate of γ to occur, and will result in the original A being replaced by a purine drawn randomly from a purine pool with A and G frequencies specified as π A / R and π G / R .Thus, the original A has a rate of γπ A /π R and γπ G /π R to change to A and G when e 2 occurs.This again differs from K80 where the original A has a rate of 0.5 of changing to A or Gwhen e 2 occurs.These events are illustrated in Figure 4a, where we use x to represent β+γ/π R .Note that it is not a good idea to use α to represent β+γ/π R for two reasons.First, if we had started with a nucleotide C or T instead of A, then we would have β+γ/π Y instead of β+γ/ π R which would force us to use α 1 and α 2 to distinguish between the two.A casual reader will then be misled to think that F84 has three rate parameters (i.e., β, α 1 and α 2 ) without knowing that α 1 and α 2 are used as different functions of the same rate parameter γ.Second, I have reserved α to represent β+γ in Figure 4b which simplifies the derivation of transition probabilities illustrated in Figure 4.
Note that whenever event e 1 happens, the original A is replace by A, C, G and T with probabilities π A, π G, π C and π T , no matter how many e 2 events has occur before or after the occurrence of e 1 .This is similar to the scenario involving the K80 model, except that the K80 model assumes equal nucleotide frequencies.It might help to think of a long sequence with L sites being A at time 0. If these L sites each have experienced at least one e 2 event, then these sites will either be A or G with probabilities π A and π G , respectively, and we expect to have π A L sites being A and π G L sites being G.In contrast, if each of these L sites has experienced at least one e 1 event, then the site will be replaced by A, C, G, or T with probabilities π A, π G, π C and π T , and we expect to observe A, C, G and T in π A L, π G L, π C L, and π T L sites, respectively.Any number of e 2 events occurring before or after the e 1 event does not change this expectation.This means that e 1 erases e 2 , but not vice versa.The occurrence of an e 2 event is informative only when no e 1 event has happened.
ly used methods for deriving variances of parameters.The first is the delta method [7][8][9], and the second uses the Fisher information matrix to obtain the variances and covariance matrix for the parameters.The delta method, which often yields nice and clean mathematical expressions for the variance, is illustrated in the Appendix.The method using the Fisher information matrix is shown below.
To estimate variance involving multiple parameters such as D K80 and κ, we first take the second order partial derivatives of lnL with respective to D K80 and κ, substituting the estimated D K80 and κ in Eqs.
The diagonal elements of M FI -1 are the variances for κ and D K80 , and the off-diagonal elements of M FI -1 are covariances.The mathematical expression for the variance of κ is tedious, but that for the variance of D K80 is simpler: Where the two parameters are in the order of κ and D K80 , i.e., the variance is 21.8445 for κ and 0.0209 for D K80 .The off-diagonal elements are covariances between the two parameters.

F84 and HKY85 model
The F84 and HKY85 model accommodate not only the differential substitution rates between transitions and transversions, but also different equilibrium nucleotide frequencies, in contrast to JC69 and K80 which assume equal equilibrium nucleotide frequencies.The same probabilistic reasoning used before can be applied to derive transition probabilities for the HKY80 model.
The rate matrix for the F84 model is ISSN: 2378-3648 • Page 7 of 10 • + γ) and β in Eq. ( 31).The transition probability from the original A to C (a transversion, Figure 4e) is simply For other transversions, e.g., p AT , one just need to replace π C by π T .The complete transition probability matrix for the F84 model is Where , , , As a quick check of the transition probabilities, we first note that when t = 0 (or when α = 0 and β = 0), then the diagonal elements are 1 and all off-diagonal elements are 0, which is what we expected.Second, when t = ∞ with α > 0 and β > 0, then the transition probabilities will approach the equilibrium frequencies, which is also what we expected.
To obtain the distance for the F84 model (D F84 ), recall that a distance is defined as µt where µ is the average substitution rate, i.e., substitution rates in Eq. ( 26) weighted by the equilibrium frequencies: Now we need to obtain βt and γt in order to calculate D F84 .We can obtain αt and βt, and then obtain γt = αt -βt, remembering that α = β + γ (Figure 4b and Eq. ( 27)].The method we will use is the same as that for the K80 model, i.e., we obtain the expected transitions and transversions, designated E(S) and E(V), respectively, from transition probabilities and equate them to the After time t, the total flow of the original A to the four nucleotides (including itself, Figure 4a and Figure 4b) is So the probability that no substitution has happened during time t (Figure 4c), according to Poisson distribution, is The rate of A changing to A, G, C, and T through e 1 is βπ A + βπ G + βπ C + βπ T = β, so the probability that at least one e 1 has occurred during time t is The probability that e 2 has happened but e 1 has not is then The reason for the condition that "e 1 has not occurred" is because e 1 event can erase e 2 event.With these, it is easy to derive transition probability from A to G (Figure 4d) as the summation of 1) A fraction of π G of p(e1 > 0,t), which is the probability of e 1 event that results in the original A being replaced by A, C, G, and T with probabilities π A, π G, π C and π T, and 2) A fraction of π G /π R of p(e 2 > 0,e 1 = 0,t), which is the probability that e 2 events not erased by e 1 .That is, From now on, p(j|i,t) will be written simply as p ij , so p(G|A,t) is p AG .With the same reasoning, we can derive transition probabilities for other A↔G and C↔T substitutions.Note that the two rate parameters in the F84 model (β and γ) have been re-parameterized into α (= β .Event e 1 occurs at a rate of β and leads to the original A being replaced by any of the four nucleotides according to their equilibrium frequencies, and event e 2 occurs at a rate of γ and results in the original A being replaced by either A or G according to their frequencies in the purine pool, i.e., π A/ π R, and π G/ π R. I used x as a shorthand for β+γ/π R .The rate γ does not appear in the final transition probabilities because it has been absorbed into α which equals β+γ shown in (b).P(j|i,t) means the probability of changing from the original nucleotide i to nucleotide j after time t, and is synonymous to p ij (t) or simply p ij in this paper.
ISSN: 2378-3648 • Page 8 of 10 • librium frequencies.Event e 2 occurs with a rate γ and results in the original A being replaced by either A or G with the probabilities equal to their respective equilibrium frequencies.Fictionalized in this way, the expected number of substitutions after time t is β(π A +π G +π C +π T ) + γ(π A +π G ) = β+γR.According to the Poisson distribution, the probability that no substitution has happened during time t is The probability that at least one e 1 occurred after time t is The probability that e 2 has occurred but e 1 has not is The transition probability p(G|A,t), abbreviated as p AG , is In the same way, we can derive other transition probabilities which are shown below: 1 1 ; ; ; As a quick check of the transition probabilities, we first note that when t = 0 (or when α = 0 and β = 0), then the diagonal elements are 1 and all off-diagonal elements are 0, which is what we expected.Second, when t approaches infinity with β > 0 and γ > 0, then the transition probabilities will approach the equilibrium frequencies, which is also what we expected.
We cannot derive the distance for the HKY85 model by following the same approach as that for the F84 model.Hasegawa, et al. [4] has tried this approach but were not successful because there is no explicit solution for βt and γt.However, if we treat the A↔G transition and C↔T transition separate, then we can solve for βt and γt [10].In other words, we obtain one set of βt and γt from observed A↔G transitions and transversions, and another set of βt and γt from observed C↔T transitions and transversions.βt in the two sets are the same as that in Eq. ( 38), but γt is different between the two sets of estimates.We can then take a weighted average of γt.Admittedly, this does sound mathematically clumsy and explains why HKY85, while commonly used in phylogenetic analysis involving a likelihood framework or Bayesian inference, is almost never used in distance-based phylogenetics.
Here is the somewhat circuitous protocol to get βt and γt from HKY85.The expected numbers of A↔G observed S and V to solve for αt and βt.With the property of time reversibility (e.g., π A• p AG = π G• p GA ), we have Equating E(S) and E(V) to the observed S and V, and solving these two equations with the two unknowns (αt and βt), we have ( ) Substitute βt and γt (= αt -βt) into Eq.( 35) and, after some algebraic manipulation, we have a more useful form of D F84 : Where

p p p p p p p p p p p p p p p p p p p p p p
To illustrate the calculation of D F84 , we may use the two aligned sequences in Figure 2 which gives us π A = 6/48, π C = 12/48, π G = 10/48, π T = 20/48, S = 4/24, V = 2/24, αt = 0.5778363341, βt = 0.2076393648, γt = αt -βt = 0.3701969693, and D F84 = 0.3198867427.The variance of the D F84 can be obtained by either the delta method or the method using Fisher information matrix.
A substitution model similar to the F84 model is the HKY85 model, with its rate matrix specified as: Where (β+γ) is often written as α and the diagonal elements are constrained by each row summing up to 0. The HKY85 model and the F84 model differ only in the specification of rates involving transitions.q AG and q CT are π G (β+γ) and π T (β+γ) in the HKY85 model specified in Eq. (41), in contrast to π G (β+γ/π R ) and π T (β+γ/π Y ), respectively, in the F84 model specified in Eq. (26).By comparing these rates, it becomes obvious that the F84 model would be equivalent to the HKY85 model if π R = π Y .
We can obtain the transition probabilities for the HKY85 model in the same way as that for the F84 model.In short, we again start with a nucleotide A and envision two events e 1 and e 2 .Event e 1 occurs with rate β, and results in the original A replaced by any of the four nucleotides with probabilities equal to their respective equi-although D HKY85 is consistently but slightly smaller than D F84 .
The HKY85 model itself may not carry much biological significance given the existence of the F84 model.However, the twists involved in computing the evolutionary distance, i.e., the separate estimation of γ A↔G and γ C↔T , lead very naturally to a very useful TN93 model that we will cover next.

TN93 model
We have come far, so far that we need hardly any extra effort to derive transition probabilities for the TN93 model.There are two equivalent specifications of the rate matrix for the TN93 model.The first is Where the diagonal elements are constrained by each row summing up to 0. The second specification simply replaces (β + γ R /π R ) by α 1 and (β + γ Y /π Y ) by α 2 .We see that TN93 is reduced to F84 if γ R = γ Y , and to HKY85 The similarity between TN93 and F84 allows us to re-use Figure 4 for deriving transition probabilities for TN93.We only need to add a subscript R to γ and α in Figure 4 so that we have γ R and α R as rates for purine, keeping everything else the same, and we instantly obtain the transition probabilities for transitional substitutions between purines and for transversional substitutions as shown in Figure 4. To get transition probabilities between pyrimidines, we can just replace the original nucleotide A in Figure 4 by nucleotide C or T and rename γ and α in Figure 4 to γ Y and α Y .Note that our α R = β+γ R , and α Y = β+γ Y .
The transition probability matrix for the TN93 model is Where x 1 and x 3 are the same as those in Eq. ( 34), but x 2 has α replaced by α R and x 4 has α replaced by α Y , i.e., To obtain the distance for the TN93 model (D TN93 ), recall that a distance is defined as µt where µ is the average substitution rate, i.e., substitution rates in Eq. (53) weighted by the equilibrium frequencies, so: Now we need to obtain α R t, α Y t, and βt.The method we will use is the same as that for the K80 and F84 models, i.e., we obtain the expected numbers of A↔G transitions, C↔T transitions, and transversions, designated E(S R ), E(S Y ) and E(V), respectively, from transition probabilities, and equate them to the observed S R , S Y and C↔T transitions, designated S R and S Y , respectively, and transversions are Setting E(S R ) and E(V) to their the observed S R and V, and solve for βt and γt, we have Where βt is the same as that in Eq. (38), and γ R t in Eq. ( 49) is γt estimated from observed S R and V. Now we obtain another set of solutions for βt and γt by setting E(S Y ) and E(V) to their observed S Y and V, and solve for βt and γt, we have the same βt but a different γt: The distance for the HKY model To compute D HY85 using the two aligned sequences in Figure 2, we have π A = 6/48, π C = 12/48, π G = 10/48, π T = 20/48, S Y = 4/24, S R = 0, V = 2/24, βt = 0.2076393648, γ R t = -0.2223239164,γ Y t = 1.047432870, weighted γt = 0.624180608, D HKY85 = 0.308904.I intentionally choose the aligned sequences in Figure 2 with S R = 0 just to see if D HKY85 would behave strangely.It did not.For comparison, the same two sequences yield D F84 = 0.319887.
In general, D HKY85 is slightly smaller than D F84 .I used the eight vertebrate COI sequences in the FASTA file VertCOI.fas that comes with DAMBE [11] to compute both D HKY85 and D F84 (Figure 5).The difference is minor, Where

p p p p p p p p p p p p p p p p p p p p p p p p p p p p
To illustrate the application of D TN93 with the two aligned sequences in Figure 2, we have π A = 6/48, π C = 12/48, π G = 10/48, π T = 20/48, S Y = 4/24, S R = 0, V = 2/24, α R t = 0.13353, α Y t = 0.90593, βt = 0.20764, γ R t = α R t -βt = -0.07411,γ Y t = α R t -βt = 0.69829, D TN93 = 0.35299.The variance of the D TN93 can be obtained by either the delta method or the method using Fisher information matrix.Note that S R = 0 means no information for estimating α R t properly.I should mention that all distance formulations in this paper are known as Independently Estimated (IE) distances because they use information from only two aligned sequences and are independent of other pairs of sequences.Practical molecular phylogenetic analysis typically would use Simultaneously Estimated (SE) distances [12,13] which use information from all pairs of sequences.SE distances are implemented in MEGA [14] and DAMBE [11,15].The PhyPA [16] function in DAMBE, which performs phylogenetic reconstruction base on pairwise alignment when reliable multiple sequence alignment is difficult to obtain for highly diverged sequences, uses SE distances only.
In short, the approach of deriving transition probabilities by probability reasoning can go a long way if one can do good bookkeeping.In particularly, the probability reasoning approach is very useful for conceptual understanding.However, the approach becomes increasing difficult with more complicated substitution models.Two alternative approaches, one involving solving dif-

Figure 1 :
Figure 1: Derivation of transition probabilities and the evolutionary distance (D) based on the JC69 model.The d value in the diagonal of the rate matrix (a) is constrained by the row sum equal to 0, i.e., d = -3α.P(j|i,t) means the probability of changing from the original nucleotide i to nucleotide j after time t, and is synonymous to p ij (t) or simply p ij in this paper.

Figure 2 :
Figure 2: Two homologous sequences for illustrating computation of pairwise evolutionary distances.

Figure 3 :
Figure 3: Derivation of transition probabilities and the evolutionary distance (D) based on the K80 model.The rate matrix (a) has the diagonal elements (d) constrained by the row sum equal to 0, i.e., d = -α -2β.P and Q are the observed proportion of transitional and transversional changes between two aligned homologous sequences.Equating them to their respective expected values, E(P)and E(Q), leads to the solution of αt and βt shown, and the evolutionary distance D. P(j|i,t) means the probability of changing from the original nucleotide i to nucleotide j after time t, and is synonymous to p ij (t) or simply p ij in this paper.
(21) and (22) into the second-order partial derivatives, arranging them into what is called a Fisher information matrix (M FI ) below, and compute the matrix inverse of M FI (designated by sequences in Figure2, we have N = 24 and empirical P = 4/24 and Q = 2/24.These lead to D K80 = 0.3151, and κ = 4.9126.The M FI and M FI -

Figure 4 :
Figure 4: Derivation of transition probabilities based on the F84 model.π A, π G, π C, π T and π R, π Y are equilibrium frequencies of A, C, G, T, purine (A + G) and pyrimidine (C + T).Event e 1 occurs at a rate of β and leads to the original A being replaced by any of the four nucleotides according to their equilibrium frequencies, and event e 2 occurs at a rate of γ and results in the original A being replaced by either A or G according to their frequencies in the purine pool, i.e., π A/ π R, and π G/ π R. I used x as a shorthand for β+γ/π R .The rate γ does not appear in the final transition probabilities because it has been absorbed into α which equals β+γ shown in (b).P(j|i,t) means the probability of changing from the original nucleotide i to nucleotide j after time t, and is synonymous to p ij (t) or simply p ij in this paper. ) ) ) ferential equations and the other involving matrix exponential and logarithms, are often used in practical computation with the GTR model for nucleotide sequences and amino acid-based substitution models.They will be numerically illustrated elsewhere.andVtosolve for α R t, α Y t, and βt:The resulting α R t, α Y t, and βt are If one wishes to express D TN93 in S R , S Y and V, then one may just substitute γ R t, γ Y t, and βt into Eq.(56), which yields: ISSN: 2378-3648 • Page 10 of 10 •