Phylogenetic Diversity and the Evolution of Molecular Sequences

Copyright: © 2015 Brocchieri L. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Diversity and its measures is a long-standing and widely explored concept in ecology, physics, economics, and sociology, among others. In ecology, the concept of diversity is tightly connected with the idea of a “healthy” community and of conservation. During the

by the phylogenetic relations among species. Intuition suggests that of two communities composed by the same number and abundance of species, the community composed of more distantly related species is more diverse since distantly related species are likely to exhibit a greater number of unique features. phylogenetic relatedness on diversity. Faith's Phylogenetic diversity [2] measures diversity based on a phylogenetic tree, as the sum of the lengths of its branches L i , i.e., as the size of the tree. Faith's phylogenetic diversity takes into consideration the phylogenetic relations among the species in the community but not the relative abundance of species. In contrast, Rao's quadratic entropy incorporates frequencies evaluating diversity as the average pair-wise dissimilarity ij d of pairs of individuals randomly sampled from the community [3].
Phylogenetic entropy diversity generalizing Shannon entropy based on a rooted phylogenetic tree of species associating with each branch i of length i L a frequency i a corresponding to the sum of the frequencies of all the species descended from that branch [4].
In a seminal paper of 2006, Lou Jost advocated the idea that standard diversity indices generally cannot be considered direct measures of diversity and do not show properties expected from "true" diversities [5]. However, all standard indices correspond to and can be transformed into "true diversities" of similar functional equally-frequent species necessary to obtain the same diversity-index value of the community under consideration. "True diversities" are in the form of Hill numbers [6] of some order q When 0 q = are counted as occurrences, as in the Species Richness Index. When , the most frequent species are favored and when only the frequency of the most abundant species contributes to the calculation of diversity (corresponding to the inverse Berger Parker Index Shannon entropy uniquely corresponds to the special case of diversity 1 D that does not favor any frequency. Contrary to most of the diversity indices, numbers equivalents behave as would be expected from true measures of diversity upon compositional changes (see [5,6] for examples) and I will refer to them as "true diversities".
Using the conceptual unifying perspective advocated by Jost, Faith's phenotypic diversity (PD) can be generalized to any phylogenetic diversity of order q, considering at once the underlying phylogenetic tree and species frequencies [7]. Considering an ultrametric tree whose branches represent amounts of evolution proportional to time (Figure 1), to each branch of the tree can be assigned an abundance corresponding to the sum of the frequencies of all species derived from the time -T of the root to present), corresponds a virtual community with frequencies assigned as described ( Figure 1). True diversities can be calculated for each of these communities and the average diversity of all communities within a chosen time interval can be calculated as an alpha diversity of any order q [8]. In the example of Figure 1, the same communities are conserved within time intervals 1 T , 2 T , and 3 T and their mean (alpha) diversity is calculated as: With some rearrangement and substitutions, in the general case this averaging is equivalent to the general formulation of mean diversity of order q over time T [7]: where L i represent branch lengths and a i the corresponding species frequencies. Multiplied by the length of the time interval T, mean diversities give phylogenetic diversities of the same order of . When these correspond to Faith's phylogenetic diversity. In the case of q , expression above but its limit exists and is: Similar diversities can be calculated for a rooted non-ultrametric tree [7] substituting T with the weighted average tree-depth Note that is insensitive to the scale of the tree. Given a tree topology and branch lengths, rescaling the tree so that the new tree has the same topology but branches k-fold the original lengths, produces over the time interval kT the same mean diversity than the original tree over time T. In contrast, phylogenetic diversity is rescaled to . .

Journal of Phylogenetics & Evolutionary Biology
the proportion of all features that are shared by the corresponding only once among all lineages and if features are never lost, a type of parsimonious evolution known as Camin-Sokal parsimony [9]. Also implicit in this measure of community diversity is the description of a species as the set of all features developed along its lineage from time T − . Only within this framework mean diversities correctly answer to of species implicit in the above derivation give rise to unexpected results if they are not taken into account. Consider for example a sample of two equally frequent and phylogenetically related species (Figure 2). At the time of speciation, the two species and their last common ancestor are identical. Intuition suggests that the true those two identical species should be 1.0. As the two species gradually assuming equal frequencies) in the community should correspondingly gradually increase, and it should approach 2.0 as the two species between them. It is interesting to consider instead the behavior of mean diversity and phylogenetic diversity calculated for this system in the time intervals ( , ]

Are Mean and Phylogenetic Diversities Applicable to the Evolution of Molecular Sequences?
Substitution of "features", in the form of nucleotide or amino acid types at each alignment position, and hence loss of features by substitution rather than accumulation of features, is the essence of how molecular sequence evolution occurs by point substitutions, and of how evolution is generally modeled in molecular sequence phylogenetics. Branches of molecular-sequence evolutionary trees represent the number of state substitutions that occur during evolution over a number of characters, rather than to accumulation of new characters. A "species", as represented by a sequence, is not an empty set at the time T − of the common ancestor, and its two incipient descendants in of non-parsimonious evolution in the form of multiple substitutions and back substitutions at each sequence site is implicit in probabilistic models of sequence evolution. As a consequence, branch lengths are not proportional to the frequency of shared or unique states in presentrepresented by branch lengths (or by their transformation in phenotypic sequences the same way it can assuming parsimonious evolution.
Mean and phylogenetic diversities depend on the rooting of the tree. Although it is not possible to re-root an ultrametric tree and to dependence on rooting of phylogenetic diversity is consistent with an intrinsic directionality of the evolutionary process, by which trees not intrinsic and not natural to phylogenetic trees constructed on the assumption of non-parsimonious time-reversible evolution, such as most trees derived by continuous-time Markov models of statesubstitutions applied to multiple alignments of nucleic acid or protein is independent from the direction of time and from the position of the root on the tree. envisioned by Lou Jost [5,8] have powerfully contributed to the generalization of Faith's [2] concept of phylogenetic diversity within a frame of parsimonious evolution [7]. Likely Jost's true diversities will also open new frontiers for characterizing diversity of metagenomic samples of molecular sequences and for using them as markers of the diversity of ecological communities, such as environmental metagenomic or microbiome samples. I believe that this will be achieved when true diversities will be combined with probabilistic models of sequence evolution and corresponding estimates of genetic relatedness [10].

Acknowledgment
This work is supported by NIH Grant 5R01GM87485-2.