Phylogenetic Classification of Seed Plants of Taiwan

Biological classification, the hierarchical arrangement of scientific names of organisms, constitutes the core infrastructure of biological databases. For an efficient management of biological databases, adopting a stable and universal biological classification system is crucial. Currently in Taiwan Biodiversity Information Facility (TaiBIF; http://taibif.tw/), the national portal website that integrates Taiwan’s biodiversity information databases, angiosperms are arranged according to Cronquist’s System of Classification, which is not compatible with current trend of the Angiosperm Phylogeny Group (APG) classification. To consolidate the function and management of the database, TaiBIF is moving to adopt the APG IV classification and Christenhusz et al. (Phytotaxa 19:55–70, 2011)’s classification of gymnosperms, which we summarize as the Phylogenetic Classification of Seed Plants of Taiwan. The Phylogenetic Classification of Seed Plants of Taiwan places gymnosperms in five families [vs. eight families in the Flora of Taiwan (FOT)] and angiosperms in 210 families (vs. 193 families in FOT). Three FOT gymnosperm families are synonymized in current treatment. Of the 210 APG IV families, familial circumscriptions of 114 families are identical with FOT and 50 families are recircumscription of FOT, with 46 families newly added. Of the 29 FOT families not included in current classification, two families are excluded and 27 families are synonymized. The adoption of the Phylogenetic Classification of Seed Plants of Taiwan in TaiBIF will provide better service and efficient management of the nation’s biodiversity information databases.


Background
Biological classification, the hierarchical arrangement of scientific names of organisms, provides keywords and links to catalogue and organize biological information (Patterson et al. 2014). Biological classification constitutes the core infrastructure of biological databases (Patterson et al. 2010(Patterson et al. , 2014. Adopting a stable and universal biological classification system not only is crucial for the users but also fundamental for the efficient management of the databases. TaiBIF (Taiwan Biodiversity Information Facility; http://taibif.tw/) is the national portal website that integrates Taiwan's biodiversity information (Shao et al. Engler's Syllabus der Pflanzenfamilien that was adopted in the Flora of Taiwan (FOT), 2nd edition (Huang 1994). Although Cronquist's System was highly influential and had been followed by several major floras such as Flora of North America (Reveal 1993) and Flora of Australia (Kanis et al. 1999), much of the content of Cronquist System is not compatible to the current trend of the APG classification.
The Angiosperm Phylogeny Group (APG) classification of the orders and families of flowering plants, now in its fourth edition (APG IV), is a collaborative effort of plant molecular systematic community worldwide (The Angiosperm Phylogeny Group 1998Group , 2003Group , 2009Group , 2016, providing the greatest stability and predictability regarding biodiversity information of flowering plants (Mayr 1981;Wearn et al. 2013). Although APG classification has not been adopted officially in Taiwan, families circumscribed by molecular phylogenetic studies and summarized by APG have been increasingly accepted by both academic (Hsu et al. 2011(Hsu et al. , 2016aWu et al. 2015) and citizen scientists (e.g., Nature Campus http://nc.biodiv.tw/bbs/ index.php).
As an official provider of biodiversity information of the country, the classification systems followed by Tai-COL has deep and profound influences. In an effort to consolidate the function and management of TaiBIF that shall result in stable and better services of the websites, it is inevitable for TaiCOL to adopt classification systems that are constructed based on results of robust molecular phylogenetic analyses. This article outlines phylogenetic classification of families of the seed plants of Taiwan summarized based on Christenhusz et al. (2011)'s classification of gymnosperms, APG IV, and subsequent studies. To facilitate the transition toward APG IV, we also provide the spreadsheet of the classification schema for all seed plant genera that will be adopted by TaiCOL (Additional file 1: Appendix S1). This spreadsheet will be updated constantly and can be downloaded through Tai-COL. A brief note is provided for families of which circumscription has been changed between the treatment of FOT and APG IV classification.

Methods
The database of seed plants of Taiwan was compiled from "a checklist of the vascular plants of Taiwan" of the Flora of Taiwan (Boufford et al. 2003), "Illustrated Guide to Aquatic Plants of Taiwan" (Yang et al. 2001), Wu et al. (2010) that summarized naturalized and invasive flora, subsequently published native (e.g., Hsu et al. 2011;Wu et al. 2015) and naturalized (e.g., Liang et al. 2011;Wang et al. 2016) species, and the flora of Tongsha (Pratas) Island Lin et al. 2005). The checklist was then imported into relational PostgreSQL database as a basis for migrating process. The migration process applied a 'data cleaning framework' to improve our data set quality through diagnosing, detecting, and correcting procedures. The data cleaning procedure included three major stages: (1) error type defining, (2) error instance identifying, and (3) error correcting (Maletic and Marcus 2000). Furthermore, we followed the data cleaning principles and methods suggested by Chapman (2005) when processing nomenclature data. In the initial stage of migration, instead of constructing a name-based database, a taxon-based database, which includes a unique taxonomy identifier (taxon ID) and several attributes such as family, genus, scientific names and vernacular names, etc., was constructed. In order to reduce the redundancy of the database and improve the data quality and integrity, we adopted relational database normalization to parse the raw data table into a second normal form schema. Through the normalization process, potential errors such as duplicate entries, misplaced taxa, etc., could be eliminated efficiently. In the second stage, we automated a python script to cross-validate our data base with Missouri Botanical Garden's Tropicos (http:// www.tropicos.org/) and International Plant Names Index (IPNI, http://ipni.org), identifying unmatched or unfound names for manual checking. In the third stage, three major possible errors or problems: (1) illegitimate or invalid names, (2) misspelled names, (3) different taxonomic treatment, were corrected after cross-validation. We adopted Ruggiero et al. (2015) for the higher level classification of seed plants (Subphylum Spermatophytina and above). For gymnosperms (Superclass Gymnospermae), Christenhusz et al. (2011)'s classification was followed, though caution was taken for the uncertainty of the phylogenetic position of gnetophytes (Lu et al. 2014;Wang and Ran 2014). For angiosperms (Superclass Angiospermae), major clades recognized as superorders in Chase and Reveal (2009) and the classification of The Angiosperm Phylogeny Group (2016) was adopted, with the exception of Boraginales in which Luebert et al. (2016)'s new familial classification was followed. For orders and families of which vernacular names are lacking in the current literature of the flora of Taiwan, the names proposed by Liu et al. (2015) were followed. (48 species in 21 genera) were ranked as the 8th, 10th and 18th most species-rich families in FOT (Hsieh 2003); however, under APG IV classification, Euphorbiaceae reduces to ca. 60 species in 17 genera, Scrophulariaceae to only 4 species in 3 genera, and Liliaceae to ca. 8 species in 2 genera. Saxifragaceae is another case of drastic changes, reducing from 25 species in 13 genera to 7 species in 4 genera. On the other hand, families such as Malvaceae, Plantaginaceae, and Orobanchaceae expand greatly, increasing from eight, one, and four genera to 26, 16, and 14 genera, respectively.

Results and discussion
In the Classification outlined below, codes composed of alphabet and number(s) are applied to each family to denote its ordinal (and superordinal) classification. For gymnosperms, Christenhusz et al. (2011)'s alphabetical (A-H) and numeric (1-12) codes for Orders and Families are adopted. For angiosperms, the numeric codes of APG IV families (1-416) are followed, with the addition of alphabetical (A-S) code for superorders and numerical (1-64) codes for orders modified from Chase and Reveal (2009). For example, "F14.60. Liliaceae 百合科" indicates Superorder Lilianae (F), Order Liliales (14), and Family Liliaceae (60). Code designations of superorders and orders are outlined in Fig. 1. The numeric family codes used in the Flora of Taiwan (Boufford et al. 2003) are also listed in parentheses after the Chinese vernacular family name to aid an easy comparison to families circumscribed in FOT. For examples, "(≡ IIA.1)" in Cycadaceae indicates Family 1 of Gymnosperma (IIA) in Boufford et al. (2003), "(≡ IIBa.35)" of Nymphaeaceae denotes Family 35 of Dicotylendons (IIBa), and "(≡ IIBb.7)" of Zosteraceae stands for Family 7 of Monocotyledons (IIBb), while the sign "≡" indicates unchanged familial circumscription between FOT and current treatment.
For those families of which circumscription has been changed, the number of genera in FOT and current treatment are also provided. For examples, "F14.60. Liliaceae 百合科 (IIBb.9; 21/2)" indicates 21 genera included in Family 9 of Monocotyledons (IIBb) in FOT, while only 2 genera are included in current classification. A statement is followed to denote newly added genera and/or genera excluded. For examples, current classification of "F18.89. Zingiberaceae 薑科 (IIBb.35; 5/5)" includes the genus Curcuma 薑黃屬 based on Wu et al. (2010), while the genus Costus of FOT is moved to F18.88 Costaceae, resulting in a total of five genera as recorded in FOT (5/5). For newly added families, genera included are listed with references to previous classification. For newly recorded families, the number in parentheses after the Chinese vernacular name indicate the number of genera included. For families whose circumscription remain unchanged (e.g., Lauraceae, Asteraceae, Fabaceae, Orchidaceae, etc.), the full list of genera included is summarized in Additional file 1: Appendix S1. The Chinese vernacular names for all scientific names of taxa are adopted from FOT and/or names proposed when published (e.g., Chung et al. 2010b;Hsu et al. 2011). For newly added taxa not published by Taiwanese authors, the Chinese names proposed by Liu et al. (2015) are adopted. Although not officially recorded as parts of the flora, current treatments of 21 frequently cultivated plant families in Taiwan not recorded in FOT (marked with *) are also included.