Skip to main content

The -Cophenetic Metric for Phylogenetic Trees As an Interleaving Distance

  • Chapter
  • First Online:
Research in Data Science

Part of the book series: Association for Women in Mathematics Series ((AWMS,volume 17))

Abstract

There are many metrics available to compare phylogenetic trees since this is a fundamental task in computational biology. In this paper, we focus on one such metric, the -cophenetic metric introduced by Cardona et al. This metric works by representing a phylogenetic tree with n labeled leaves as a point in \(\mathbb {R}^{n(n+1)/2}\) known as the cophenetic vector, then comparing the two resulting Euclidean points using the distance. Meanwhile, the interleaving distance is a formal categorical construction generalized from the definition of Chazal et al., originally introduced to compare persistence modules arising from the field of topological data analysis. We show that the -cophenetic metric is an example of an interleaving distance. To do this, we define phylogenetic trees as a category of merge trees with some additional structure, namely, labelings on the leaves plus a requirement that morphisms respect these labels. Then we can use the definition of a flow on this category to give an interleaving distance. Finally, we show that, because of the additional structure given by the categories defined, the map sending a labeled merge tree to the cophenetic vector is, in fact, an isometric embedding, thus proving that the -cophenetic metric is an interleaving distance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 49.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This is also known as a [0, )-actegory, but category with a flow is both easier to say and fails to generate a flurry of questions about assumed typos.

  2. 2.

    Note that traditionally, a Lawvere metric does not require the axiom of symmetry. However, as all of our constructions are symmetric, we regularly drop the word “symmetric” for simplicity.

  3. 3.

    The analogy between category and meta-category is like the comparison of sets and classes.

  4. 4.

    This category is equivalently thought of as the slice category \(\mathbf {Top} \downarrow \mathbb {R}\).

References

  1. P.K. Agarwal, K. Fox, A. Nath, A. Sidiropoulos, Y. Wang, Computing the Gromov-Hausdorff distance for metric trees. ACM Trans. Algorithms 14(2), 1–20 (2018). https://doi.org/10.1145/3185466

    Article  MathSciNet  Google Scholar 

  2. R. Alberich, G. Cardona, F. Rosselló, G. Valiente, An algebraic metric for phylogenetic trees. Appl. Math. Lett. 22(9), 1320–1324 (2009). https://doi.org/10.1016/j.aml.2009.03.003

    Article  MathSciNet  Google Scholar 

  3. A. Babu, Zigzag coarsenings, mapper stability and gene network analyses, Ph.D. thesis, Stanford University, 2013

    Google Scholar 

  4. U. Bauer, X. Ge, Y. Wang: measuring distance between Reeb graphs, in Annual Symposium on Computational Geometry - SOCG 14 (ACM Press, New York, 2014). https://doi.org/10.1145/2582112.2582169

    Google Scholar 

  5. U. Bauer, E. Munch, Y. Wang, Strong equivalence of the interleaving and functional distortion metrics for Reeb graphs, in 31st International Symposium on Computational Geometry (SoCG 2015), Leibniz International Proceedings in Informatics (LIPIcs), vol. 34, pp. 461–475 (Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, 2015). https://doi.org/10.4230/LIPIcs.SOCG.2015.461. http://drops.dagstuhl.de/opus/volltexte/2015/5146

  6. U. Bauer, B. Di Fabio, C. Landi, An edit distance for Reeb graphs (2016). https://doi.org/10.6092/unibo/amsacta/4705

  7. K. Beketayev, D. Yeliussizov, D. Morozov, G.H. Weber, B. Hamann, Measuring the distance between merge trees, in Mathematics and Visualization (Springer, Cham, 2014), pp. 151–165. https://doi.org/10.1007/978-3-319-04099-8_10

    MATH  Google Scholar 

  8. S. Biasotti, D. Giorgi, M. Spagnuolo, B. Falcidieno, Reeb graphs for shape analysis and applications. Theor. Comput. Sci. Comput. Algebraic Geom. Appl. 392(13), 5–22 (2008). https://doi.org/10.1016/j.tcs.2007.10.018. http://www.sciencedirect.com/science/article/pii/S0304397507007396

  9. L.J. Billera, S.P. Holmes, K. Vogtmann, Geometry of the space of phylogenetic trees. Adv. Appl. Math. 27(4), 733–767 (2001). https://doi.org/10.1006/aama.2001.0759

    Article  MathSciNet  Google Scholar 

  10. H.B. Bjerkevik, M.B. Botnan, Computational complexity of the interleaving distance, in 34th International Symposium on Computational Geometry (SoCG 2018) (Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Wadern, 2018)

    Google Scholar 

  11. D. Bryant, J. Tsang, P.E. Kearney, M. Li, Computing the quartet distance between evolutionary trees, in Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’00, pp. 285–286 (Society for Industrial and Applied Mathematics, Philadelphia, 2000). http://dl.acm.org/citation.cfm?id=338219.338264

  12. P. Bubenik, J.A. Scott, Categorification of persistent homology. Discret. Comput. Geom. 51(3), 600–627 (2014). https://doi.org/10.1007/s00454-014-9573-x

    Article  MathSciNet  Google Scholar 

  13. P. Bubenik, V. de Silva, J. Scott, Metrics for generalized persistence modules. Found. Comput. Math. 15(6), 1501–1531 (2014). https://doi.org/10.1007/s10208-014-9229-5

    Article  MathSciNet  Google Scholar 

  14. G. Cardona, A. Mir, F. Rosselló, L. Rotger, D. Sánchez, Cophenetic metrics for phylogenetic trees, after Sokal and Rohlf. BMC Bioinforma. 14(1), 3 (2013). https://doi.org/10.1186/1471-2105-14-3

  15. M. Carrière, S. Oudot, Structure and stability of the one-dimensional mapper. Found. Comput. Math. (2017). https://doi.org/10.1007/s10208-017-9370-z

  16. F. Chazal, D. Cohen-Steiner, M. Glisse, L.J. Guibas, S.Y. Oudot, Proximity of persistence modules and their diagrams, in Proceedings of the 25th Annual Symposium on Computational Geometry, SCG ’09, pp. 237–246 (ACM, New York, 2009). https://doi.org/10.1145/1542362.1542407. http://doi.acm.org/10.1145/1542362.1542407

  17. F. Chazal, V. de Silva, M. Glisse, S. Oudot, The Structure and Stability of Persistence Modules (Springer, New York, 2016). https://doi.org/10.1007/978-3-319-42545-0

    Book  Google Scholar 

  18. J. Curry, Sheaves, cosheaves and applications, Ph.D. thesis, University of Pennsylvania, 2014

    Google Scholar 

  19. V. de Silva, E. Munch, A. Patel, Categorified Reeb graphs. Discret. Comput. Geom. 1–53 (2016). https://doi.org/10.1007/s00454-016-9763-9

  20. V. de Silva, E. Munch, A. Stefanou, Theory of interleavings on categories with a flow. Theory Appl. Categories 33(21), 583–607 (2018). http://www.tac.mta.ca/tac/volumes/33/21/33-21.pdf

    MathSciNet  MATH  Google Scholar 

  21. B. Di Fabio, C. Landi, The edit distance for Reeb graphs of surfaces. Discrete Comput. Geom. 55(2), 423–461 (2016). https://doi.org/10.1007/s00454-016-9758-6

    Article  MathSciNet  Google Scholar 

  22. P.W. Diaconis, S.P. Holmes, Matchings and phylogenetic trees. Proc. Natl. Acad. Sci. 95(25), 14600–14602 (1998). http://www.pnas.org/content/95/25/14600.abstract

    Article  MathSciNet  Google Scholar 

  23. J. Eldridge, M. Belkin, Y. Wang, Beyond Hartigan consistency: merge distortion metric for hierarchical clustering, in Proceedings of The 28th Conference on Learning Theory, ed. by P. Grünwald, E. Hazan, S. Kale. Proceedings of Machine Learning Research, vol. 40, pp. 588–606 (PMLR, Paris, 2015). http://proceedings.mlr.press/v40/Eldridge15.html

  24. H. Fernau, M. Kaufmann, M. Poths, Comparing trees via crossing minimization. J. Comput. Syst. Sci. 76(7), 593–608 (2010). https://doi.org/10.1016/j.jcss.2009.10.014

    Article  MathSciNet  Google Scholar 

  25. F.W. Lawvere, Metric spaces, generalized logic, and closed categories. Rendiconti del seminario matématico e fisico di Milano 43(1), 135–166 (1973). Republished in: Reprints in Theory and Applications of Categories, No. 1 (2002), pp. 1–37

    Google Scholar 

  26. B. Lin, A. Monod, R. Yoshida, Tropical foundations for probability & statistics on phylogenetic tree space (2018). arXiv:1805.12400v2

    Google Scholar 

  27. T. Mailund, C.N.S. Pedersen, QDist–quartet distance between evolutionary trees. Bioinformatics 20(10), 1636–1637 (2004). https://doi.org/10.1093/bioinformatics/bth097

    Article  Google Scholar 

  28. D. Morozov, K. Beketayev, G. Weber, Interleaving distance between merge trees, in Proceedings of TopoInVis (2013)

    Google Scholar 

  29. V. Moulton, T. Wu, A parsimony-based metric for phylogenetic trees. Adv. Appl. Math. 66, 22–45 (2015). https://doi.org/10.1016/j.aam.2015.02.002

    Article  MathSciNet  Google Scholar 

  30. E. Munch, B. Wang, Convergence between categorical representations of Reeb space and mapper, in 32nd International Symposium on Computational Geometry (SoCG 2016) ed. by S. Fekete, A. Lubiw Leibniz International Proceedings in Informatics (LIPIcs), vol. 51, pp. 53:1–53:16 (Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, 2016). https://doi.org/10.4230/LIPIcs.SoCG.2016.53. http://drops.dagstuhl.de/opus/volltexte/2016/5945

  31. M. Owen, Computing geodesic distances in tree space. SIAM J. Discret. Math. 25(4), 1506–1529 (2011). https://doi.org/10.1137/090751396

    Article  MathSciNet  Google Scholar 

  32. G. Reeb, Sur les points singuliers d’une forme de pfaff complèment intégrable ou d’une fonction numérique. C.R. Acad. Sci. 222, 847–849 (1946)

    Google Scholar 

  33. E. Riehl, Category Theory in Context (Courier Dover Publications, New York, 2017)

    MATH  Google Scholar 

  34. D. Robinson, L. Foulds, Comparison of weighted labelled trees, in Combinatorial Mathematics VI (Springer, Berlin, 1979), pp. 119–126. https://doi.org/10.1007/BFb0102690

    Google Scholar 

  35. D. Robinson, L. Foulds, Comparison of phylogenetic trees. Math. Biosci. 53(1–2), 131–147 (1981). https://doi.org/10.1016/0025-5564(81)90043-2

    Article  MathSciNet  Google Scholar 

  36. G. Singh, F. Mémoli, G.E. Carlsson, Topological methods for the analysis of high dimensional data sets and 3D object recognition, in SPBG, pp. 91–100 (2007)

    Google Scholar 

  37. A. Stefanou, Dynamics on categories and applications, Ph.D. thesis, University at Albany, State University of New York, 2018

    Google Scholar 

  38. G. Valiente, An efficient bottom-up distance between trees, in SPIRE (IEEE, Piscataway, 2001), p. 0212

    Google Scholar 

Download references

Acknowledgements

The authors gratefully thank two anonymous reviewers whose feedback substantially increased the quality of the paper. The work of EM was supported in part by NSF Grant Nos. DMS-1800446 and CMMI-1800466. AS was partially supported both by the National Science Foundation through grant NSF-CCF-1740761 TRIPODS TGDA@OSU and by the Mathematical Biosciences Institute at the Ohio State University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elizabeth Munch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The Author(s) and the Association for Women in Mathematics

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Munch, E., Stefanou, A. (2019). The -Cophenetic Metric for Phylogenetic Trees As an Interleaving Distance. In: Gasparovic, E., Domeniconi, C. (eds) Research in Data Science. Association for Women in Mathematics Series, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-030-11566-1_5

Download citation

Publish with us

Policies and ethics