Chloroplast capture and range extension after hybridization in taro (Colocasia esculenta)

Abstract Complete chloroplast genomes of 17 samples from six species of Colocasia (Araceae) were sequenced, assembled, and aligned together with two previously reported complete genome sequences from taro (Colocasia esculenta). Analysis provides a well‐supported phylogenetic tree for taro and closely‐related wild Colocasia species in Southeast Asia. Two chloroplast lineages (CI and CII) form a well‐defined haplotype group and are found in cultivated taros known as var. esculenta (dasheen, CI), var. antiquorum (eddoe, CII), and in a widespread, commensal wild form known as var. aquatilis (CI). A third lineage (CIII) is also found in wild taros known as var. aquatilis and in the wild species C. lihengiae, C. formosana, and C. spongifolia. We suggest three different scenarios to explain the grouping of CIII wild taros (C. esculenta) with other wild Colocasia species. Chloroplast lineages CI and CIII in C. esculenta and an unknown parent species may be involved in an as yet undated history of hybridization, chloroplast capture, and range extension. Substantial taxonomic revision may be needed for C. esculenta after further studies of morphological and genetic diversity within the crop, in wild populations, and in closely related wild species. The results also point to the Bengal delta as a region of key interest for future research on the origins of tropical wetland taros.


| INTRODUC TI ON
Colocasia esculenta (L.) (Schott, 1858) (taro, Araceae) is an ancient starchy staple and green vegetable crop cultivated in tropical to temperate regions of the world and is consumed today by billions of people (Matthews & Ghanem, 2021).Remains or probable remains of taro (including starch grains, calcium oxalate raphides, pollen, seeds, dried macro-remains, and volcanic ash impressions) have been reported in archaeological contexts ranging in age from hundreds to thousands of years ago in Asia (GARF, 2003;Li et al., 2022;Paz, 2005), the Pacific (Golson et al., 2017;Horrocks & Thomas, 2022;Loy et al., 1992;Prebble et al., 2019), andEgypt (van der Veen, 2011).Ahmed et al. (2020) argued that if taro was introduced to Papua New Guinea as a cultigen in the early Holocene, primary domestication of the crop may have been earlier than this in South to Southeast Asia (during the Pleistocene), within the western part of the natural range.Their work pointed to the origin of tropically cultivated taro (and a widespread form of commensal wild taro) in the general vicinity of the Bay of Bengal.
Approaches to defining the natural range of taro (Matthews, 1991(Matthews, , 2014(Matthews, , 2023;;Matthews et al., 2017) are contingent on how the species is delimited by taxonomists.It is also important to recognize that commensal wild taros (i.e., those associated with human settlement) can have multiple origins as invasive natural wildtypes, as useful wild plants that have naturalized after being transplanted without cultivation, and as feral, self-dispersing garden escapes.Currently, taro is known as a highly polymorphic species with two common cultivated morphotypes and many intermediate forms (Ahmed et al., 2020;Hay, 1998;Lakhanpaul et al., 2003;Orchard, 2006;Plucknett, 1983).Vegetative and floral traits found in the diverse cultivated forms can be found in multiple wild Colocasia species, making it difficult to include C. esculenta in taxonomic keys for wild species (Matthews et al., 2022).Taxonomically, the definition of taro has always been difficult (Hay, 1998), in part because the first formal botanical descriptions were based on cultivars and in part because earlier descriptions employ fewer characters than later descriptions, making comparison difficult.Taxonomic uncertainty may also reflect intra-specific hybridization within taro and interspecific hybridization between taro and other Colocasia species (Ahmed et al., 2020;Matthews, 2014Matthews, , 2023)).
Chloroplast DNA in taro was first studied using analysis of restriction enzyme fragment polymorphisms (RFLPs) in wild and cultivated taros from across Asia (Ochiai et al., 2000;Yoshino, 1994).Since then, partial and complete chloroplast genome sequences have been published for many genera of Araceae, including Colocasia (Abdullah, Henriquez, Mehmood, et al., 2021;Ahmed et al., 2012Ahmed et al., , 2013Ahmed et al., , 2020;;Henriquez et al., 2014;Ly et al., 2017;Nauheimer et al., 2012;Nauheimer & Boyce, 2014).Uniparental, maternal transmission of chloroplast genomes in taro has not yet been demonstrated, but is likely as this pattern is usual in most angiosperms (Greiner et al., 2015).Agricultural researchers have mostly studied the relatively narrow genetic diversity present in collections of clonally propagated taro cultivars, using analysis of simple sequence repeats (SSRs) or genome-wide single nucleotide polymorphisms (SNPs), for example (Bammite et al., 2021;Chaïr et al., 2016;Helmkampf et al., 2017;Wang et al., 2020).Near-complete nuclear genome sequences have been reported for a small number of taro cultivars (Bellinger et al., 2020;Soulard et al., 2017;Yin et al., 2021), and nuclear DNA sequence data have been generated for 128 species in 111 genera of Araceae, including taro, using target sequence capture and the Angiosperms 353 universal probe set (Haigh et al., 2022).
These recent studies have produced reference data sets that raise many new possibilities for evolutionary studies of taro and its wild relatives.
Together with a wide geographical survey of wild and cultivated taros, Ahmed et al. (2020) presented the first draft of a chloroplast phylogenetic tree for taro, revealing three main haplotype groups or lineages ("clades" CI-III).This tree was based on the sequences of six selected chloroplast loci (Ahmed et al., 2013), two Colocasia species, C. esculenta, and Colocasia formosana (Hayata, 1919;Matthews et al., 2015), and two distant-outgroup aroid genera (Remusatia and Steudnera).Ahmed et al. (2020) found one especially widespread haplotype (C1, Type 1) in tropical, cultivated taros (in Africa, Asia, and Oceania) and commensal wild taros (Southeast Asia to southern Japan).
Here, we extend the genealogical tree structure by analysis of complete chloroplast genome sequences in nine additional samples of C. esculenta, and eight additional samples from five other Colocasia species: Colocasia fallax Schott (Ara, 2000;Deva & Naithani, 1985), C. formosana, Colocasia lihengieae (Long & Liu, 2001), Colocasia oresbia (Hay, 1996), and Colocasia spongifolia (Matthews et al., 2022).The results indicate that within Colocasia, C. fallax is distant from and C. oresbia is near a group comprised of C. formosana, C. lihengieae, C. spongifolia, and C. esculenta.Our analysis thus included a larger number of close relatives of taro (three versus one) than previously reported (Ahmed et al., 2020), and produced a phylogenetic tree supported by high non-parametric bootstrap values.We show that the three chloroplast lineages within C. esculenta are not close sisters and that the close wild relative C. formosana is distinct from C. esculenta.We also show that the CI and CII lineages form a close sister group that includes cultivated taros with morphotypes generally known as dasheen ("var.esculenta", producing large mother corms) and eddoe ("var.antiquorum", producing many side corms).The CIII lineage of wild C. esculenta is not closest to CI and CII.Rather, it appears closest to C. lihengieae within the CIII haplogroup, which So far, the CIII lineage has been found in wild C. esculenta and three other, wild Colocasia species (Ahmed et al., 2020, and present results).To explain the grouping of CIII wild taros (C.esculenta) with other wild Colocasia species, we suggest that C. esculenta and an unknown parent species (Colocasia sp.) were involved in a process of hybridization, chloroplast capture, and range extension.The results also point to the Assam-Bengal floodplain as a region of key interest for exploring the complex genetic and geographical origins of tropical cultivated taros.

| MATERIAL S AND ME THODS
All samples were collected by the present authors in locations across Asia and the Pacific (Figure 1, Table A1, and Figures A1-A11).
Previous studies (Ahmed et al., 2013(Ahmed et al., , 2020) ) identified specific chloroplast loci and PCR primers that can be used to identify haplotypes.
We used this approach to test samples in pilot surveys of wild taro For recently collected samples, DNA was extracted using the protocol of Ahmed et al. (2009) with minor modifications.The oldest sample, from Australia, was collected in 1987, extracted as described by Matthews (2014), and preserved in a frozen DNA archive.
Seventeen samples were newly sequenced using the Illumina PE 150 run (Genwiz Life Sciences, China).Bioinformatic analyses, including sequencing-data quality checks, genome assembly, annotations, circularization, and data curation, were done as reported previously (Abdullah, Henriquez, Mehmood, et al., 2021).Subsequent analyses included alignments of two previously reported complete chloroplast genome sequences (Ahmed et al., 2012) from New Zealand (CESNZ03, var.GP, triploid, and var.RR, triploid, CESNZ02; Table A1) and the 17 new samples (see Data Availability Statement).Using MAFFT in Geneious, sequence sets (see below) were aligned, and one copy of the inverted repeat (IRa) and all gaps (indels) were removed.Removal of IRa is required to avoid double representation of the large repeat region.The online software IQ-Tree (Nguyen et al., 2015) was used to perform Maximum Likelihood (ML) analysis with "auto" model selection and bootstrapping for 1000 times in Ultrafast mode (bootstrapping 100 times in Standard mode gave similar results).Outgroups were automatically inferred by IQTree.Model selection by IQ-Tree employed ModelFinder (Kalyaanamoorthy et al., 2017).Tree diagrams produced by IQ-Tree were prepared for presentation using the "branch-swapping" functions of TreeDyn (Chevenet et al., 2006).A1.

| RE SULTS
Our initial, more inclusive analysis placed C. fallax as distant from all the other species (Figure A12).Subsequent analyses without C. fallax allowed alignment of the remaining sequences with fewer gaps and provided a well-supported genealogical tree structure (phylogenetic hypothesis), with C. oresbia standing as a near outgroup and C. esculenta (taro) grouped together with C. lihengiae, C. formosana, and C. spongifolia (Figure 3).Two chloroplast lineages, previously identified as clades CI and CII in C. esculenta (Ahmed et al., 2020), form a well-defined group of sister lineages with a deep split.These are found in cultivated taros known as var.antiquorum (eddoe, CII), var.esculenta (dasheen, CI), and in a widespread, commensal wild form known as var.aquatilis (CI).A further group of wild taros (also var.aquatilis) from Australia, Vietnam, and Thailand form a distinct subgroup within the CIII haplogroup alongside the wild species C. lihengiae, C. formosana, and C. spongifolia (Figure 4), separate from the CI and CII sister lineages.The apparent non-congruence between chloroplast lineages (haplogroups) and morphological diversity within C. esculenta is summarized in Table 1 (a schema based on present results and the findings of Ahmed et al., 2020).A1).The overall genome structure was the same in each sample and similar to previous reports for C. esculenta (Abdullah, Henriquez, Croat, et al., 2021;Abdullah, Henriquez, Mehmood, et al., 2021;Ahmed et al., 2012).With C. fallax excluded and using a complete chloroplast genome alignment, there was clear resolution of tree structure within C. esculenta and closely related taxa (Figure 3), with PI = 0.518 sites/100 bp and an average BS value = 93.9%.With C. oresbia excluded and using intergenic sequences only, there was better resolution of branches within CI and CIII (Figure 4) and a similarly robust result: PI = 0.538 sites/100 bp and average BS value = 94.8%.(Hay, 1996) in Borneo, Malaysia: Plant habit and locations.C. oresbia has an erect habit, lacks stolons, and does not form spreading clumps: (a) Plant in full sun, Nanga Gaat, Sarawak (photo: P. Boyce, 13th May 2004).Sabah locations (photos: J. Joling, 22nd Dec. 2020): (b) JJ02 fruiting in semi-shade, (c) JJ03 in semi-shade.Map, right: Type location for C. oresbia on Mt Kinabalu (triangle).Map detail, left: Source locations for sequenced samples (CORMY01, CORMY02) in a scattered population of mostly isolated plants (black dots) in Sabah.District names and boundaries are shown.Sample details are given in Table A1.

| DISCUSS ION
The main findings of interest for the evolution of taro (C.esculenta) are the non-congruence between chloroplast genome diversity and morphological diversity within the species and the grouping of CIII wild taro with other wild Colocasia species and C. lihengiae in particular (Figures 3 and 4, Table 1).Later, we propose three different evolutionary scenarios that may explain these findings.Note that F I G U R E 3 ML tree for C. esculenta and close relatives, including C. oresbia (identified as a near outgroup through preliminary analysis) and based on complete chloroplast genome alignment.* = node with 100% support next to very short (indistinct) branch.The scale for branch lengths (lower left) indicates number of substitutions per nucleotide position (divergence).The colour coding for main lineages is used throughout this paper: CI (pale orange), CI (blue) CIII (green).Sample details are given Table A1.
F I G U R E 4 ML tree for C. esculenta and close relatives, excluding the near outgroup C. oresbia and based on total intergenic sequence (IGS) alignment.Sample details are given Table A1.

TA B L E 1
Schema showing non-congruencies between morphotype and chloroplast genome haplogroup in taro and closely related species: var.aquatilis is associated with the CI and CIII haplogroups, while CI is associated with the "dasheen" and var.aquatilis morphotypes, and CIII is associated with the non-stolon-producing morphotype in C. spongifolia and stolon-producing morphotypes in other species.the schema shown in Table 1 requires further substantiation with DNA tests on a wider range of cultivated samples of known morphology, and that cultivated taros might include hybrids that contradict the present schema.The ML tree in Figure 4 has apparently better resolution of CIII wild taros as a distinct subgroup, a result obtained by excluding C. oresbia and analysing just the intergenic sequences, which have a higher density of parsimony informative sites than other genome partitions (Table A2).The labels CI-CIII were previously used to identify "clades" (Ahmed et al., 2020) but are used here to identify haplogroups or lineages.The term "clade" can be understood to represent a group of taxonomic species with a shared common ancestor, while our data are primarily a representation of diversity among individual chloroplast genomes (haplotypes), from which we first attempt to make inferences about relationships among chloroplast haplogroups and then among taxonomic species.associated with Colocasia spp.(Takano et al., 2021).

Haplogroups
Further results of interest (1-4) are: 1. Colocasia oresbia (Figures 2 and 3) appears to be a near outgroup and is expected to be a valuable reference taxon for future studies of C. esculenta and its close wild relatives.The habit and and floral structure of C. fallax is very distinct, and was expected to be distant from the other species studied here; it was included to help clarify the relationship between C. esculenta and C. oresbia (an island Southeast Asian isolate).
2. CIII haplotypes were absent in the wild Bangladesh taros tested and form a distinct lineage (Figure 4) that is distributed in wild taros (also var.aquatilis) from Vietnam to Thailand and northern Australia; natural dispersal of this lineage, from Southeast Asia to Australia and New Guinea, may have occurred during the late Miocene to late Pleiocene (Ahmed et al., 2020).
3. CI diversity in Bangladesh (Figures 3 and 4) includes a commensal wild plant (Figure A1b, likely derived from nearby cultivation) that is sister to var.GP, a widespread triploid clone in northern New Zealand (Figure A1a, and Matthews, 2014); this suggests that early (19th century?)British shipping to New Zealand might have introduced var.GP from the Bengal region.
4. Ta-imo (Figures 3 and 4), a common pondfield cultivar in southern Japan (Figure A3), displays the CI Type 1 haplotype, the most widespread haplotype in Asia and the Pacific, in cultivated and commensal wild taros (Ahmed et al., 2020); since Type 1 is nested among the wild CI taros found in Bangladesh, it might also originate in the Bengal region (see further discussion below).
The results also have implications for taxonomy.Although CI and CII form a well-defined sister group, the deep split might correspond to a species-level evolutionary divergence between the progenitors of CII cultivars known as var.antiquorum (eddoe), and those of CI taros that include var.esculenta (dasheen) and widespread, commensal forms of var.aquatilis (Ahmed et al., 2020).
This might be an artefact of incomplete taxonomic sampling of closely related Colocasia species.Adding further species will probably resolve the polytomy.Within the group, C. formosana appears to be a distinct species and not a part (Li & Boyce, 2010) or an ecotype (Matthews et al., 2015) of C. esculenta.In wild breeding populations across Taiwan and in the northern Philippines, C. formosana is morphologically uniform and distinct from wild popula- According to a study of plants initially identified as C. lihengiae in North East India (Gogoi et al., 2019), the name C. lihengiae may be synonymous with C. mannii Hook.f.(Hooker, 1894).To confirm this, it will be useful to compare C. mannii in India with populations of C. lihengiae in Thailand (Sangnin, 2002), Vietnam (Nguyen et al., 2016), and southern China (Long & Liu, 2001).Colocasia lihengiae was placed in C. antiquorum (Schott, 1832) by Li and Boyce (2010), while C. antiquorum is usually placed within C. esculenta (Hay, 1998;Orchard, 2006)  Since many genetic and morphological parameters remain uncertain, we cannot favour any particular scenario at present or estimate the timings for the events proposed.There are also many possibilities for complex admixture of nuclear genomes that cannot be assessed here, especially if hybridization events have been common.
Such events might have occurred long before human interactions with Colocasia spp., but human interactions with taro could also be very old, given the deep antiquity of modern humans and other hominins (Homo spp.) in tropical Asia and the ability of early modern humans to occupy tropical forest environments (Bacon et al., 2021;Roberts et al., 2016).
Previously, we found that CI, Type 1 taros, are widespread as tropical wetland cultivars and in commensal wild populations (Ahmed et al., 2020).The diversity of CI wild populations in Bangladesh (Figure 4) suggests that CI, Type 1 taros (here represented by Ta-imo, a wetland cultivar in southern Japan,

| CON CLUS IONS
The present study raises many biological, taxonomic, and historical questions.By using complete and intergenic chloroplast genome se- In the future, it may be necessary to recognize CI, CII, and CIII taros as distinct species, or C. esculenta as one species that includes diverse hybrids arising in nature and as a result of human activities.
Formal taxonomic revision of C. esculenta will require a search for wild CII populations for comparison with wild CI and CIII populations, and a study of wild Colocasia species that may have been involved in the proposed process of hybridization, chloroplast capture, and range extension.Since taro is a globally cultivated crop with a wide-ranging literature based on current taxonomy, care is needed to gain at least some acceptance for changes in nomenclature before they are formally proposed.Basic taxonomic issues concerning wild and cultivated taros were explored by Hay (1998), who recommended abandoning historical varietal names applied to cultivated forms of taro (var.esculenta, var.antiquorum).We use these names here for convenience, with the caveat that they do not represent the full range of morphological diversity in taro.
We recommend that C. esculenta (L.) Schott continue to be regarded as a single polymorphic species until more is known about nuclear genome diversity in wild taro populations and closely related wild Colocasia species.Complementary studies of chloroplast and nuclear genome diversity are needed to explore the evolutionary scenarios presented here, and wider sampling of populations is needed to confirm the "authentic" genomic components (Rieseberg & Wendel, 1993) of each species.A closer study of morphology and its development is also needed to better distinguish CI, CII, and CIII taros in the field and possible hybrid forms involving the different chloroplast genomes.
Experimental work to explore breeding barriers will help to resolve issues in the taxonomy of taro and Colocasia species and to identify species that can most easily or usefully contribute to crop breeding.Currently, there is no international collection of wild Colocasia species or international institution with a mandate for basic research on taro and its wild relatives.Breeding programmes for taro have been constrained for many reasons (Lebot & Ivančič, 2022), and the lack of basic research to address biological, taxonomic, and historical questions is also a significant constraint.These questions involve wild and cultivated species, evolutionary history, and deep human history, so holistic and integrative approaches (Hay, 2019) are needed across the natural, agricultural, and human sciences.

A PPEN D I X A
This appendix has four sections: 5.1.Sample illustrations (Figures A1-A11) showing sampled plants, populations, and habitats.5.2.A sample list (Table A1) with sequence-related summary data and sample collection details. 5.3.Method details with further ML tree diagrams (Figures A12-A15) and tree comparison statistics (Table A2).5.4.Additional discussion of methodological constraints, substitution models, and taxonomic sampling.

A.3. | Method details
The result of a preliminary analysis with C. fallax (Figure A12), original tree diagrams, log data from IQ-Tree runs, and our derived calculations are all presented here.The run program used was IQ-TREE multicore version 1.6.12 for Linux 64-bit, built Aug 15 th , 2019.
To prepare Figures 3 and 4 A2.
The average number of parsimony informative (PI) sites per 100 bp of aligned sequence, and average bootstrap (BS) percentages across all nodes were highest in our analysis of four closely related (in-group) taxa using only the intergenic (non-coding) sequence data (Table A2 and Figure A15 A15).

A.4. | Methodological constraints, substitution models, and taxonomic sampling
Following precedents in other taxonomic studies (for example, Androsiuk et al., 2020;Duvall et al., 2016), various methods of analysis and substitution model selection were tried with the complete, unpartitioned chloroplast sequences.Maximum Likelihood (ML) analyses with different models in the General Time Reversible (GTR) family of nested models all gave similar results.Using unpartitioned data is known to be unrealistic because different regions mutate in different ways at different rates (Kelchner, 2000(Kelchner, , 2008)).Nevertheless, the long sequence (134,382 bp) (complete genome with one large inverted repeat removed) had many parsimony-informative sites (Table A2) and gave a robust topology for the present taxonomic sample set (Figure 3 and Figure A14).For the CDS analysis (Figures A12   and A13) and intergenic sequence analysis (Figure 4 and Figure A15), the data were partitioned prior to model selection and analysis.
After using ModelFinder to identify the best-fit substitution models to analyse our sample sets and data partitions, we also analysed each sample set using the Jukes-Cantor (JC) model with ML optimization in IQ-Tree (Table A2) in order to learn if the JC assumption of equal substitution rates at all positions affected the topology.
Only small differences in the topology and branch lengths of each tree were found (diagrams not shown).Although the JC model gave better bootstrap support values for intergenic sequences, this substitution model was not selected by ModelFinder.In a series of experiments with simulated data sets, Abadi et al. (2019) found that the JC model generates topologies that are only slightly less often correct than those generated with complex models, and that the most complex nucleotide substitution model, GTR + I + G, consistently leads to inferences of tree topology that are very similar to those obtained with other models selected as optimal by jModelTest across a range of model selection criteria (BIC and others).They suggest that for many data sets, optimal model selection is not necessary and that the GTR + I + G model is adequate.We did not attempt to use models specific to putative functional domains of intergenic sequences (e.g., stem-loop structures involved in gene regulation), as the main limiting factor for our study was the non-availability of Colocasia species for testing.
Although the topologies appear robust for the present data set, topologies, and the biological significance of the results can be improved in the future by wider geographic sampling for each species and by more comprehensive taxonomic sampling of Colocasia species.There is no international collection of taro that can provide samples of close wild relatives.Our sample set represents approx.
one quarter of known Colocasia species (Matthews et al., 2022), and it is likely that not all Colocasia species have been discovered, given the lack of botanical exploration for this genus in large areas of Southeast Asia.
Analysed by Ahmed et al. (2012Ahmed et al. ( , 2013Ahmed et al. ( , 2020)) (Hayata, 1919;Hsu et al., 2000 Note: Genbank sample labels follow the series established by Ahmed et al. (2020) and employ the same species-country-number format, with a three-letter code for each species (e.g., CES for C. esculenta), a two-letter international country code, and an individual number.Label history: Alphagenomics Ltd (AG), National Museum of Ethnology (NME).WP, Garmin GPS waypoint site number.Latitude, longitude and elevation are based on the GPS and World Geodetic System WGS84 grid, unless noted as a vicinity estimate.Google Earth (GE) (2024) was used to visually check GPS elevations, and in some cases the GE estimate was preferred.

TABLE A1 (Continued)
F I G U R E A 1 2 ML tree with C. fallax and five other Colocasia species, based on all protein-coding gene sequences (CDS), excluding those of IRa (one of two copies of the large inverted repeat).C. fallax is placed as a distant outgroup, and C. oresbia is placed as a near outgroup for C. esculenta and its close wild relatives.Sample details are given Table A1.
TA B L E A 2 Summary of calculations based in IQ-Tree run log data.Note: All analyses with IRa and gaps removed; CDS = protein coding sequence, "Complete" = all data after removal of IRa and gaps; "Intergenic" = all non-coding exons (no introns included).Average bootstrap (BS) values are shown for each substitution model selected by ModelFinder, and for the Jukes-Cantor model (selected manually).Models used were: Kimura 3-parameter (K3P) model with variable base frequencies, equal transition rates, two transversion rates; Transversion model (TVM) with variable base frequencies, variable transversion rates, transition rates equal, and Jukes-Cantor (JC) with equal base frequencies, all substitutions equally likely.
C. formosana and C. spongifolia.Wild taros, generally known as C. esculenta "var.aquatilis" (producing long stolons with indeterminate growth and being semi-aquatic) were represented by CI and CIII haplotypes.
populations and wild Colocasia species in Bangladesh, Thailand, and Vietnam, identify CI and CIII individuals, and then choose samples of C. esculenta, C. formosana, C. lihengiae, and C. spongifolia for complete genome sequencing in order to represent as much haplotype diversity as possible within our budget.One cultivar with small side corms, from a market in northern Pakistan, was included without prior testing and provided a new CII sequence.
Initial ML analysis using an alignment of all protein coding sequences (CDS, total concatenated, aligned sequence of 68,985 bp) from all available Colocasia species placed C. fallax as a distant outgroup and C. oresbia (Figure 2) as a near outgroup for C. esculenta and closely related taxa (Figure A12, Method details in Appendix A).We therefore excluded C. fallax from further sample sets to avoid (a) long-branch attraction bias in the phylogenetic tree (reviewed in Bergsten, 2005), and (b) information loss from insertion/deletion sites that may appear when target ingroup sequences are aligned with a distant outgroup.For analysis of C. oresbia (the near outgroup), C. esculenta, and close relatives, complete genome sequences were aligned, F I G U R E 1 Colocasia species sampled.Key shows: Species name, number of samples sequenced (total = 19), c = cultivated, w = wild.Circles = C. esculenta; triangles = closely related species, squares = outgroup taxa.Multiple samples from within a limited area are shown inside boxes (schematic).The C. lihengieae group includes a possible hybrid.Sample details are given in Table giving a total aligned sequence of 134,382 bp with 696 parsimonyinformative sites.To improve subclade resolution, all intergenic sequences from C. esculenta and close relatives (with C. oresbia excluded) were aligned, giving a total aligned sequence of 46,269 bp with 249 parsimony-informative sites.To compare bootstrap support in relation to the density of parsimony informative sites across different sequence partitions (CDS, complete genome, intergenic), for each sample set, we calculated the average bootstrap value across all nodes in each ML tree (average BS percentage) and used log data from IQ-Tree runs to calculate the number of parsimony informative sites per 100 bp of aligned sequence (average PI).To gain an indication of the impact of model misspecification on these calculated parameters, we also compared trees built with models selected by ModelFinder and trees built with the Jukes-Cantor model and ML optimization in IQ-Tree.These calculations and comparisons are reported in Method details in Appendix A and discussed in relation to methodological constraints in Discussion section in Appendix A.
Complete chloroplast genome sizes ranged from 161,252 bp in C. fallax to 161,973 bp in C. oresbia and 162,644 bp in C. esculenta (Table

a
Produces long stolons.b Shy-sprouting; new shoots sprout directly from erect or decumbent mother stem.

A
possible Colocasia hybrid (sample CxVN01, Figure A8) was identified in the field as a possible hybrid of C. lihengieae and C. menglaensis, which are sympatric species in the northern mountains of Vietnam.Further studies of this sample and adjacent populations of Colocasia spp.are needed to gain insight into local gene flow and the host preferences of specialist insect pollinators (Colocasiomyia spp.) 4.3.2| Range extension from east to west and southHybridization, introgression, and capture of a CI chloroplast genome by a CIII wild taro from lower-montane Southeast Asia produced an invasive hybrid ("lower-montane C. esculenta" × Colocasia?sp.) that spread westward across the Assam-Bengal floodplain and then southward into the Indian peninsula, where wild taro populations are also widespread.The CI maternal chloroplast donor in this scenario could have been an unknown CI species in Northeast India, Bangladesh, or another nearby region.The nuclear genome of the hybrid may represent a subset of nuclear genomic diversity in CIII wild taros ("lower-montane C. esculenta"), and there could have been further divergence in the nuclear genome of CI wild taros if the proposed events were early enough.Chloroplast genome diversity in taro across the Indian peninsula remains entirely untested, but is assumed here to lie within the CI lineage.

Figure 4
Figure 4 and Figure A3) might have originated in the extremely west Bengal delta.This raises a further question of how and when CI taros moved eastward or westward across the Brahmaputra/ Burma boundary, a major terrestrial biogeographic barrier for plants and animals that may also have been a barrier for modern humans until the Last Glacial Period (beginning at Marine Isotope Stage 4, MIS 4, approx.71,000 years ago), when human movement into Island Southeast Asia is thought to have been favoured by more open forest habitats (Boivin et al., 2013).Hypotheses regarding the genetic and geographical origins of tropical wetland cultivars (CI) cannot be tested without geographical sampling of wild populations across the entire Bengal delta, upper catchments of the Ganga and Brahmaputra rivers, the Indian peninsula, and other regions around the Bay of Bengal.There are further huge gaps in sampling across Southeast Asia, East Asia, and the western Pacific.The likely route for dispersal of CIII wild taro through Indonesia to Australia and New Guinea (Ahmed et al., 2020) remains unexplored.This gap is unfortunate since Rumphius (2011) (1741-1750) (writing in the late 17th century, and first published in the 18th century) already distinguished two kinds of wild "water Kelady"(water taro, "Kelady Ayer") by habitat and usage, namely: "Vicorum" [Latin for "village"], growing in mire (swampy, boggy ground) in and behind the villages of Ambon, and "seldom eaten" because it is "too sharp" (= CIII, a natural inland population?), and "aquatile", growing on the sides of rivers on Ambon and Java, and often eaten by poor people or used as pig's fodder (= CI, an invasive, commensal wild population?).Throughout Southeast Asia, there are innumerable opportunities for localized, present-day interbreeding between remnant natural populations of wild taro and commensal wild taro populations that have spread widely in open habitats created by humans (most notably in the vicinity of wetland rice production in drains, irrigation channels, and along streams and rivers).Ongoing interactions between wild taro populations and other Colocasia species are also likely, according to the presence and preferences of specialist insect pollinators (Colocasiomyia spp.) at different altitudes and among different potential hosts.Detecting past interactions among closely related Colocasia species will require more detailed population studies in areas with and without sympatry today.
quences (Figures2 and 3), we have refined the model (hypothesis) of phylogenetic relationships among Colocasia species.The genealogical tree presented here is more robust than that reported byAhmed et al. (2020), but is certain to change again in the future when more Colocasia species are added.Cultivated taros are now seen to belong to sister lineages (CI, CII) that form a distinct group with a deep evolutionary split, despite the morphological similarity of many wild CI taros to the wild CIII taro lineage.Since CIII chloroplast genomes are found in multiple wild species, those species may represent a major clade within Colocasia.In future surveys, CI and CII haplotypes might also be found in multiple species.It is also possible that inter-specific hybridization has contributed to the evolution of new Colocasia species and reticulate evolution(Arnold, 2016) among closely related and widely distributed species such as C. esculenta, C. lihengiae/mannii, and others.Hybridization, chloroplast capture, and range extension could have happened long before the domestication and spread of taro as a crop, or more recently as part of crop history.Regardless of when and where such events first occurred, hybridization might have been accelerated by human translocation of fertile, diploid CI, and CII taros into new areas sympatric with each other and with other wild Colocasia species.
(a) CESNZ03 (ex horto); sampled plant in campus garden, University of Auckland, New Zealand; photo by I. Ahmed, 25th June 2008.(b) Typical commensal wild, clonal population in wet ground, surrounded by pasture, and single large corm (at right) at least 4 years old based on visible record of seasonal changes in corm diameter -with stolon (lower arrow) and directly-growing side-shoot (upper arrow); vicinity Whangarei, northern New Zealand; photos by P. J. Matthews, 29th October 2009.(c, d) CESBD01, commensal wild taro (with large starchy corm, so likely an escaped cultivar) on grazed river bank, Mymensingh, Bangladesh; photos by P. J. Matthews, 10th Feb. 2019.(c) Habitat and single plant (arrow) found and sampled.(d) Whole plant with starchy corm and stolon fragment.
, Figure A12, branches on the original trees produced by IQ-Tree were rotated and swapped for a consistent layout.Figures A13-A15 show the original trees with all bootstrap values and no manipulation of branches.For each tree below (Figures A13-A15), complete original sequence data were aligned, the Inverted Repeat-a (IRa) region was deleted, the specific regions to be analysed were extracted and concatenated, and all gaps were removed.Original IQ-Tree results (output trees with bootstrap values) are shown below and are used to calculate average bootstrap (BS) values across all nodes in each tree.To compare the data support and robustness of each ML analysis, with respect to sequence partition and model selection, calculations based on run log data are summarized in Table A2.Run log data are shown in full with each tree.Results obtained with the Jukes-Cantor (JC) model (implemented in IQ-Tree with ML optimized) are also shown in Table ).Overall, the analyses are similarly robust, but resolution of subclades within closely related taxa of the ingroup (CI-CIII, our main focus) was improved by removing outgroup taxa (C.fallax, C. oresbia) and the most conserved component of the chloroplast genome (CDS).In FiguresA13 and A14, C. formosana is shown as a near outgroup for CIII C. esculenta and C. lihengiae, with low bootstrap values (60% and 52%, respectively), while in FigureA15, C. spongifolia is shown as the near outgroup, with a low bootstrap value of 58%.The conservative CDS analysis (FigureA13) may be the most reliable indication of phylogeny in this case, but more study is needed.Since the entire haplogroup with C. spongifolia and C. formosana includes an apparent polytomy that cannot be resolved into distinct lineages, the haplogroup is identified here as just one CIII lineage.For each IQ-Tree analysis in which ModelFinder selected the best fit model, the run log data and calculations are given below, after each original output figure (Figures A13-
(a-c) CESTH06 collection site; plants wild in vacant land (former farmland) under motorway, Khlong Prapa, Bangkok, Thailand; photos by P. J. Matthews, 18th Feb. 2019.(a) Sampled roadside population, (clump spreading with long stolons).(b) Detail showing upper leaf morphology.(c) Detail showing green petioles, white at base with white basal ring; corms with white roots, skin, cortex, core parenchyma; note slightly angular, sagittate blade with wide and deep sinus, and 8-10 primary lateral veins branching on each side from costae of the posterior and anterior lobes.(d, e) CESTH07 collection site; commensal wild plants, Ko Kret Island, Chao Praya, Bangkok, Thailand; photos by P. J. Matthews, 18th Feb. 2019.(d) Sampled population in swamp inland behind houses built near shore of river island.(e) Detail showing blades, and long, upper spathe curving to near horizontal position.(f-h) Reproductive morphology at other sites, typical of the wild populations in and around Bangkok, photos by P. J. Matthews, 16th-17th Jan. 2020, and ex situ 27th Aug. 2022.(f) Mature inflorescence, and series of fruiting heads from the same plant; sterile appendix more than half the length of male zone (here ca.0.7×; at other sites equal), and yellow upper spathe much longer than spadix, and nearly horizontal.(g) Ripe fruiting head with soft, orange berries.(h) Tip of young stolon with tendrils (potted plant, ex situ).
(a) CESAU24 collection site with wild taro population in riverine forest, at Hopevale, northeast Queensland, Australia (photo by K. Thiele, 26th Sept. 1987; Matthews at left).(b-d) Wild taros, Cape Tribulation, northeast Queensland; photos by P. J. Matthews, mid-Aug.1992.(b) Mature inflorescence, and young fruiting heads; note short sterile appendix (less than half the length of male zone silhouetted in this image), and upper yellow spathe that is much longer than the spadix.(c) Near-mature fruiting head, fully fertilised (all berries developed, with mature seeds inside, indicating effective insect pollination).(d) Population in wet gully; note oval-sagittate blade with wide and deep sinus, and 8-10 primary lateral veins branching on each side from costae of the posterior and anterior lobes.F I G U R E A 8 Colocasia lihengiae (Long & Liu, 2001) and a possible hybrid (all samples CIII).(a-e) (C.lihengiae).(a, b) CLIVN04 collection site with commensal wild population, at edge of forest, Phu Tho prov., Vietnam; photos: P. J. Matthews, 4th Oct. 2017.(c-e) CLIVN05 collection site, Tukuk Commune, Dang Son district, Vietnam; photos by P. J. Matthews, 4th Oct. 2017.(c) Plants wild on roadside bank, at edge of forest.(d) sampled plant showing petiole (purple) and lower side of blade.(e) Upper side of blade; note angular shape, apiculate apex, and shining surface (wettable, non-waxy).(f-i) CxVN01 collection site with possible hybrid (C.lihengiae × ?menglaensis); isolated plant wild on roadside bank, at edge of forest near Yen Bai/Phu Tho border, Vietnam; photos by P. J. Matthews, 5th Oct. 2017.(f) Sampled plant, with water adhering to leaf after splash test though leaf surface is not shiny (like C. menglaensis, and in contrast to wettable shiny leaf of C. lihengieae).(g) Same plant showing stolons, and angular blade with wide sinus and apiculate apex (shape similar to C. lihengiae).(h) Detail showing strongly raised secondary veins on lower surface between primary lateral veins (as in C. menglaensis).(i) Detail showing broad petiole sheath (as in C. menglaensis), but with smooth petiole surface (as in C. lihengieae, in contrast to hairy in C. menglaensis).
(a-c) CFOTW03 collection site; plants wild on roadside at edge of forest, Wutai district, Pingtung County, Taiwan, photos by P. J. Matthews, 1st Sept. 2014.(a) Inflorescence with stingless bees on outside (and pollinating flies, Colocasiomyia sp., inside) (both insects are commonly seen on inflorescences in Taiwan).(b) Plants on wet rock face nearby, showing typical entirely green leaf, with rounded blade and shallow sinus.(c) Fully ripe fruiting head (reddish orange colour) after separation from peduncle and falling; seeds were collected.(d, e) Seedlings grown ex situ in Osaka; photos by P. J. Matthews.(d) Plant with vigorous stolons, 17th June 2015.(e) Inflorescence with large sterile appendix (longer than male zone, and base wider than male zone), and abundant sterile male flowers (staminodes, creamy yellow) among the female flowers, 11th Dec. 2018.

F
Original IQ-Tree output with bootstrap values for ML tree with C. fallax and five other Colocasia species, based on all protein-coding gene sequences (CDS), excluding those of IRa (one of two copies of the large inverted repeat).See final tree in FigureA12.F I G U R E A 1 4 ML tree based on complete sequence data (all genic and intergenic regions, after removing IRa and gaps) from 18 samples (five species).Original IQ-Tree output with bootstrap values.See final tree in Figure3.F I G U R E A 1 5 ML tree based on all intergenic sequence data (after removing IRa and gaps) from 16 samples (four species).Original IQ-Tree output with bootstrap values.See final tree in Figure4.

1: Long separated species In
(Brown, 1810taros are found to have morphological and genetic traits that set them apart, despite being interfertile, then taxonomic revision may be needed.Caladium acre(Brown, 1810(Brown,  -1830)), an early name proposed for wild taro in northern Australia and discussed by (Singh et al., 2012)t that C. lihengiae is close to but distinct from wild C. esculenta in the CIII species group (Figures 3 and 4).However, C. lihengiae is represented here by just three samples collected within a radius of 30 kilometres in northern Vietnam, including one apparent hybrid (CxVN01).As a widespread but littlestudied species, C. lihengiae may have considerable unrecognized phenotypic and genetic diversity.Next, we propose three alternative evolutionary scenarios to explain (a) the lack of congruence between the chloroplast genome and morphological diversity in wild and cultivated forms of C. esculenta and (b) the finding that C. lihengiae is a close sister to a wild form of C. esculenta within the CIII lineage.4.1 | Scenario this scenario, distinct chloroplast lineages correspond to distinct, long-separated species: CIII wild taro is a separate species from C. esculenta and derived from an ancestral population of C. lihengiae (the nearest sister taxon, on a short branch in Figures3 and 4), or from an unknown common ancestor of CIII wild taro and C. lihengiae.A7; and details in Scenario 3 below).A further contra-indication forthis scenario is previous plant breeding research in the Pacific, which showed that a wild taro from Bangkok (from the same widespread Bangkok wild population sampled here) is interfertile with cultivated taro and could confer Taro Leaf Blight (TLB) resistance(Singh et al., 2012).This suggests a close genetic relationship between wild CIII taro and the cultivated taro parent (most likely CI in the Pacific).erences,so the original hybridization event may have been long ago, giving time for separate evolution as species.Alternatively, there may have been multiple, early hybridization events (that is, reticulation) involving different ancestral populations of each parent species.In either case, the nuclear genomes of CIII wild taros should be distinct from the nuclear genomes of CI taros.This scenario is not supported by the morphological similarity of CI and CIII wild taros and the dissimilarity of the latter and C. lihengiae (assuming that most of the morphological differences that distinguish species are determined by nuclear genomes).4.3 | Scenario 3: Hybridization,

introgression, and chloroplast capture followed by range extension (3.1) from west to east and south, or (3.2) from east to west and south Commensal
Two possible versions of this scenario are therefore (Ahmed et al., 2020)ast Asia are often CI, entirely green with long stolons (e.g., FiguresA2 and A3), and comparable to, or identified as, C. esculenta var.aquatilis(Ahmed et al., 2020).taros lack morphological traits that set them apart (contrary to Scenario 1), and this can be tested by giving closer attention to morphological diversity in wild taro populations.From the evidence of chloroplast diversity alone, the directionality of chloroplast capture and subsequent range extension cannot be determined, so we cannot be sure which haplotype, CI or CIII, is authentic or typical for C. esculenta.
Sample list.

TABLE A1
Bach Ma National Park, Vietnam.Coll.PJM and Nguyen V. D., 22nd Sept. 2018.Leaf all green; petiole green to base; petiole sheath broad, no stolon (buds only); abundant fruiting; on wet ground at base of slope between road and forest, growing on granite rubble and humus; associated flora incl.Begonia, Elatostema, tree fern, wild banana, Alocasia odora (epiphytic on wall of ravine with waterfall).16.2043 N, 107.8579

)
CFO_CP WP108 Seedling grown ex situ; seed ex wild plant, Wutai district, Pingtung County, Taiwan.Coll.PJM and K.-C.Tsai, 1st Sept. 2014.Abundant plants around waterfall; flowering, bees and pollinating flies present (also collected).In this area, the plant is known as famine food requiring special care to cook.GPS 22.7351