PhyloJS: Bridging phylogenetics and web development with a JavaScript utility library

Abstract There is an increasing number of libraries devoted to parsing, manipulating and visualising phylogenetic trees in JavaScript. Many of these libraries bundle tree manipulation with visualisation, but have limited ability to manipulate trees and lack detailed documentation. As the number of web‐based phylogenetic tools and the size of phylogenetics datasets increases, there is a need for a library that parses, writes and manipulates phylogenetic trees that is interoperable with other phylogenetic and data visualisation libraries. Here we introduce PhyloJS, a light zero‐dependency TypeScript and JavaScript library for reading, writing and manipulating phylogenetic trees. PhyloJS allows for modification of and data‐extraction from trees to integrate with other phylogenetics and data visualisation libraries. It can swiftly handle large trees, up to at least 10 6 tips in size, making it ideal for developing the next generation of more complex web‐based phylogenetics applications handling ever larger datasets. The PhyloJS source code is available on GitHub (https://github.com/clockor2/phylojs) and can be installed via npm with the command npm install phylojs. Extensive documentation is available at https://clockor2.github.io/phylojs/.

as a reusable package.In addition, other libraries that aim for broader utility integrate tree representation with the D3 visualisation library, inheriting dependencies and constraining their use in other applications (Bostock et al., 2011) (Table 1).
Overall, as of early September 2023 there were 57 packages matching the search term 'phylogenetic' in the npm registry (the main repository for JavaScript packages, akin to CRAN and PyPI for R and Python packages) from the last 10 years (Table 1).Of these, the majority became available in the last 5 years.All of these are either primarily devoted to visualisation, or offer limited ability to manipulate trees to the extent available in the state-of-the-art libraries of other ecosystems such as ape or treeio in R; dendroPy or ETE3 in python; and PhyloNetworks in julia (Huerta-Cepas et al., 2016;Paradis & Schliep, 2019;Solís-Lemus et al., 2017;Yu et al., 2017).
Although visualisation is essential because phylogenetic trees are inherently visual models, the large number of visualisation libraries has created a niche for a library solely devoted to manipulating phylogenetic trees.Moreover, such a library further enables phylogenetic computation in client-side applications, where all computation is done in the browser without need for a server.This has notable benefits for data security and accessibility, especially for sensitive data in genomic epidemiology since the data never leaves the user's computer.In conjunction, visualisation and manipulationoriented libraries will allow developers to produce applications with both high quality visualisation and greater functionality.
Here we present, PhyloJS, a well-documented, standardised and standalone JavaScript library for parsing, writing and manipulating phylogenetic trees to develop novel phylogenetics web-based applications with an emphasis on client-side computation.PhyloJS focuses on tree manipulation rather than visualisation and can efficiently interface other phylogenetic and data visualisation libraries, such as phylocanvas, phylotree or plotly (Abudahab et al., 2021;Plotly-Technologies-Inc, 2015;Shank et al., 2018).

| IMPLEMENTATI ON AND USAG E
A detailed application programming interface (API) for PhyloJS documenting all functions, methods and classes can be found at https:// clock or2.github.io/ phylo js/ .
The representation, parsing and writing of trees in PhyloJS is based on the algorithms used in IcyTree, a well-established clientside visualisation tool for phylogenetic trees (Vaughan, 2017).
PhyloJS is written in TypeScript, supporting type-safe development and, therefore, is ideal for debugging larger applications.Overall, it offers a package that will be familiar to those who have used popular phylogenetic utility libraries in other languages such as ape (Paradis & Schliep, 2019) in R and DendroPy (Sukumaran & Holder, 2010) in Python (Figure 1).PhyloJS includes two classes: Node and Tree (Figure 1).The Node class includes properties for branch length (branchLength) leading into the node, as well as descending and ancestral nodes TA B L E 1 A summary of phylogenetics libraries available on the npm registry.
PhyloJS can also parse lists of trees separated by the newline character in formats additional to Nexus (e.g.readTreesFrom-Newick(), readTreesFromPhyloXML()) to arrays of Tree objects.This is useful for efficiently manipulating multiple trees at once using array methods in JavaScript (e.g.TreesArray. map(tree = > tree.ladderise())).Finally, PhyloJS includes a general function read(), that accepts strings and a schema argument to select a particular file format.It returns a Tree array (Tree[]) in all cases and is useful for applications with multiple input file formats.

| Writing
PhyloJS writes individual trees to Newick and Nexus formats (writeNewick(), writeNexus()).Both functions also accept a callback for including annotations.The default behaviour is not to annotate and users can supply their own annotator or use the inbuilt beastAnnotation() or nhxAnnotation() functions to write annotations in the common formats used in BEAST or NHX (Bouckaert et al., 2019;Suchard et al., 2018).If users wish to write from an array of trees to one string, then writing functions can be applied via a JavaScript array method (e.g.TreesArray.map(tree = > writeNewick(tree)).join('\n')).The branch length property (.branchLength) for each node can be undefined, and all treewriting functions omit undefined branch lengths when writing.Last, PhyloJS does not automatically resolve polytomies and thus can write and parse multifurcating trees.

| Manipulating
The The Node class also includes methods to add and remove children (.addChild() and .removeChild()),get ancestral nodes (.getAncestors()) and apply a function via a pre-order and postorder traversal of nested nodes (.applyPreOrder() and .ap-plyPostOrder()). Branch length modifications can be applied directly to the .branchLengthproperty of each node.

F I G U R E 1
Conceptual figure of PhyloJS' utility, with a non-exhaustive list of functions and methods for parsing and manipulating trees and nodes.PhyloJS parses trees, such as from user input and other libraries, and provides a utility library to operate on the tree and return it for visualisation, user interaction and processing via other libraries.
The Tree class also includes several convenience methods that help to extract data from the trees.For example, (.getRTTDist()) returns root-to-tip distances for each tip and (.getBranch-Lengths()) returns all branch lengths.For example, the root-to-tip regression tool Clockor2 (clock or2.github.io) uses these methods internally (Featherstone et al., 2024).

| Benchmarking
We benchmarked PhyloJS against other libraries in Table 1 by parsing simulated Newick trees with 10 1 to 10 6 tips.Parsing Newick trees presents a task common to all phylogenetics libraries, and we note that methods on the Tree class remained fast for the largest trees.PhyloJS was among the fastest, with only a fractional delay compared to newick-reader (<1(ms)) for trees with fewer than 10 4 tips.This was probably due to additional logic in PhyloJS' Newick parser, which accounts for annotations and hybrid nodes and is lacking in newick-reader (Figure 2).Taxonum-component, though not a library for manipulating phylogenetic trees, is included as a comparison because it presents the state-of-the-art parser optimised for the largest phylogenetic trees Sanderson (2022).For trees with more than 10 4 tips, PhyloJS is second to Taxonium in speed.
Both PhyloJS and Taxonium incorporate a parser that utilises a stack to read Newick strings, based on the parser in jstreeview (https:// github.com/ lh3/ jstre eview/ tree/ main).We found this approach conferred a notable speedup in comparison to recursive approaches and the capacity to parse much larger trees.

| Clockor2: A larger scale example
PhyloJS was initially developed to support phylogenetics web applications that incorporate analysis and visualisation without the need for server side computation.Clockor2 (https:// clock or2.github.

| Interfacing with visualisation libraries
For a minimal example of interfacing PhyloJS with a visualisation library, we direct readers to the visualisation example https:// clock

| Working with arrays of trees
In the following example, we parse two trees written in Newick format to an array of Tree objects using readTreesFromNewick().
For both trees, all branch lengths are equal to 1.We arbitrarily reroot each tree at the 4th node, ladderise them and randomly rescale each branch length.We finally write the resulting trees to Newick format.
This example demonstrates how users can manipulate multiple trees concurrently.
// Using two small trees with 3 tips and all branch lengths set to 1.

| Internal to external branch length ratio annotation
Here, we demonstrate how to calculate the internal to external branch length ratio (IE ratio hereon) for clades descending from each internal node of a tree.We then add the statistic as a node annotation.This sort of program could, for example, be implemented as part of an application to visualise parts of a tree that drive values of summary statistics the most.Tree statistics including tree length, the Gamma statistic (Pybus & Harvey, 2000), Sackin Index and Colless Imbalance Index (Fischer et al., 2023). b The root-to-tip regression example is a significant simplification of Clockor2, which makes extensive use of PhyloJS.
From the 57 libraries available as of early September 2023, we selected those which had reached version 1.0.0 and could parse a common tree format-Newick, Nexus, phyloXML or NeXML.The networks column refers to whether libraries are capable of parsing phylogenetic networks, which contain hybrid nodes.This table compares libraries in their ability to read, write and manipulate phylogenetic trees and networks.Many of the included libraries, such as phylocanvas, phylotree and phyD3, excel in visualisation beyond this.The size of each package is the minified bundle size.aIncludes D3 as a dependency.(children and parent, respectively).Child and parent properties facilitate the nesting of nodes, wherein a node can have any number of descending child nodes and up to one parent (zero for the root).Node objects also store annotations via the annotation property, which is itself an object storing key-value pairs of annotations.Hybrid nodes in phylogenetic networks are also denoted with a Boolean property (isHybrid = TRUE).The Tree class is a wrapper of the Node class.It contains the root node at the highest level, and each subsequent node nested within it.It includes getters for internal nodes and tips (.nodeList and .leafList,respectively), which return arrays of Node objects (a leaf is a node object without any child nodes).The Tree class includes many other methods for manipulating trees, referenced in the following sections and examples.
Tree and Node classes include a number of methods that are unique among JavaScript libraries and or useful for manipulating trees.For topological manipulation, the Tree class includes methods for rerooting (.reroot()), ladderisation (.ladderise()) and extraction of subtrees (.getClade()).It also includes methods for accessing common ancestors (.getMRCA()).
io/ ) is a key example, performing root-to-tip regression with strict local molecular clocks in the browser using PhyloJS.Here, PhyloJS provides all of the functionality to parse, manipulate and analyse trees including root-to tip regression.All operations make use of PhyloJS, which parses and returns trees in Newick format for visualisation using phylocanvas and Plotly, or for download(Abudahab et al., 2021; Plotly-Technologies-Inc, 2015).
A summary of each example and its potential or existing application in future or existing software.