Taxonomic requirements for better documenting and understanding biological invasions – the example of genetic weatherfish Misgurnus/Paramisgurnus sp. identification

Management of biological invasions strongly depends on early and accurate detection of non-native species, yet species identification is often complicated for various reasons. One prominent example relates to the controversy about the genetic specimen assignment of Asian and Oriental weatherfish species introduced into Europe. Weatherfishes, comprising the genera Misgurnus and Paramisgurnus (Cobitidae) are small benthic freshwater fishes with a wide range of habitats in the temperate to subtropical regions of Eurasia. Many of the eleven described species have been introduced outside their native ranges, mainly through ornamental trade and as food. Due to their poorly known life cycles, unclear morphology, overlapping meristic features and frequent hybridisation, the challenges associated with accurate species identification in this group comprise cryptic species and cryptic invasions, unresolved classical and molecular taxonomy, haplotype sharing and incomplete molecular genetic reference databases. Based on our newly generated molecular phylogeny comprising 289 published weatherfish COI barcodes, the existence of distinct phylogenetic clusters is evident. Except for the endangered Central European species, Misgurnus fossilis, and an unnamed cluster from Vietnam, all clusters were polyphyletic. Haplotype sharing was frequently observed, as well as specimens only labelled to genus or higher taxonomic levels. We conclude that genetic analysis of type specimens or type regions to resolve the underlying taxonomy and complete the reference databases would be necessary as prerequisite for accurate species identification in the weatherfish group. Such information is crucial in assessing their worldwide species distribution patterns, ecosystem impacts and invasive potential. As molecular genetic databases are constantly growing, new taxa are being proposed, and taxonomies are being changed in light of new data, it is obligatory to consider past publications in light of the dynamics of species names and taxonomic phylogenies. We still recommend early sharing of exotic species records since such knowledge is particularly crucial when it comes to management of invasive species.


Introduction
Biological invasions are considered a main factor affecting biodiversity, with multiple societal and economic impacts (e.g., Keller et al. 2011). In addition to the necessity for unified frameworks to describe biological invasions (Robertson et al. 2020), management of biological invasions strongly depends on early and accurate recordings of invasive species. As illustrated by recent examples related to the controversy about Asian and Oriental weatherfish species introduced into Europe ( Riffel et al. 1994;Razzetti et al. 2001;Freyhof and Korte 2005;Belle et al. 2017;Stoeckle et al. 2019;Zangl et al. 2020), there are many challenges associated with accurate species assignment in this group (Chen 1981;Vasil'eva 2001;Kottelat 2012). For instance, a recent rapid communication paper by Zangl et al. (2020) concludes that previous samples of weatherfish analysed in a paper from Belle et al. (2017) were misidentified using molecular taxonomy, potentially overlooking that several new mtDNA barcoding sequences that Zangl et al. (2020) were able to include in their paper were not available at the time of publication. Other challenges related to genetic weatherfish identification lie in (i) the accessibility of type specimens and their genetic reference sequences, (ii) classical taxonomy with many synonymous species names, (iii) hybridisation, as well as (iv) a not yet fully analysed suite of observed species and forms from different geographical regions and habitats. Many of these challenges are also relevant to other biological invasions. We thus use the example of weatherfishes to discuss current deficits and useful principles that should generally be considered when it comes to genetic species identification and documentation of invasive species.
Weatherfishes (Cobitidae) contain the two genera Misgurnus and Paramisgurnus. They are small benthic freshwater fishes with a wide range of habitats; still or slowly flowing rivers, lakes and ponds with muddy bottoms, and agricultural landscapes including rice fields and ditches (Meyer and Hinrich 2000;Kanou et al. 2007;Chen et al. 2015b). The native ranges of the eleven valid species cover the temperate to subtropical regions of Eurasia including Japan (see Table 1, Fricke et al. 2020). No native weatherfish species are known from the Americas, Africa and Australia. Depending on the regional context, weatherfishes are used as ornamental fish, food, and live bait and are frequently traded and sold in pet shops. Several weatherfish species have established globally outside their native distribution ranges. For instance, M. anguillicaudatus was recorded in Australia (Allen 1984), Europe (Razzetti et al. 2001;Franch et al. 2008;van Kessel et al. 2013), North America (Simon et al. 2006), South America (Abilhoa et al. 2013) and several Asian countries (e.g. Juliano 1989;Sal'nikov 1998). Paramisgurnus dabryanus was recorded in Europe (Riffel et al. 1994;Zięba et al. 2010;Freyhof 2013;Stoeckle et al. 2019), Japan (Mukai et al. 2011;Kanou et al. 2007), and the USA (Kirsch et al. 2018). Table 1. List of valid taxonomic names and synonyms in the genera Misgurnus and Paramisgurnus, their native ranges, and the associated type localities (Kottelat 2012;Fricke et al. 2020). Species names in bold indicate that the species is currently recorded as introduced elsewhere outside its native range.  (Brys et al. 2020;Zangl et al. 2020). On the other hand, the Central European species M. fossilis is endangered (Council of the European Communities 1992) and a target species for conservation, in which local extinctions partly remained unnoticed because of the simultaneous introduction of morphologically similar exotic weatherfish species (Freyhof 2013). Other species like M. anguillicaudatus and P. dabryanus were introduced and established in regions where they are considered undesired alien species; yet they are threatened in their native ranges due to overexploitation (Chen et al. 2015a;Yi et al. 2017Yi et al. , 2018. Morphologically distinguishing the different species of weatherfish can be challenging, as meristic and morphological features are not easily accessible by the non-expert (Chen 1981). For example, the number of vertebrae or the form of the lamina circularis, an enlarged pectoral fin ray feature displayed in males of some species either need to be characterized based on x-ray imaging or dissection. Also, field identification based on the specific position of dorsal and ventral fins typically requires direct comparison with other species (e.g., Vasil'eva 2001). Several meristic measurements for different species partially overlap (e.g., Kim and Park 1995;Vasil'eva 2001), and metric characteristics can be disguised in ethanol-preserved specimens (Kotusz 1995, own observation), further complicating correct identification. Additionally, the coloration of many species seems to be highly variable and variants with different colour are being cultured for ornamental trade (see Figure 1H) which all contributes to the morphological confusion of cryptic species under one name (Kottelat 2012). Hybridization between the East Asian species (M. anguillicaudatus × P. dabryanus ssp.) occurs in the wild and is artificially implemented to enhance food resources (You et al. 2009;Zhang et al. 2018). Hybrid vigour also can enhance the invasive potential of hybrids (Cucherousset and Olden 2011;Huang et al. 2017), and hybrids cannot be assigned to a species. Additionally, polyploidy occurs frequently, even within the populations of one species (Drozd et al. 2010;Zhao et al. 2012).
Molecular genetic tools included in integrative taxonomy are increasingly considered a reliable and unambiguous alternative to classical methods of species identification (e.g., Beggel et al. 2015;Pieri et al. 2018;Weiss et al. 2018). In recent years, various biochemical, mitochondrial, and nuclear molecular genetic markers have been used to investigate the phylogeny, phylogeography, distribution and species delimitation, and the intra-specific or population genetic structure in the genera Misgurnus and Paramisgurnus (e.g., Perdices et al. 2012;Thomsen et al. 2012;Jakovlić et al. 2013;Chen et al. 2015a;Yi et al. 2016aYi et al. , b, 2017Brys et al. 2021). As in other metazoan animal groups, an approximately 650 bp long segment of the mitochondrial cytochrome oxidase I (COI) gene is the predominantly used molecular marker for genetic species identification including weatherfishes (Yi et al. 2016b(Yi et al. , 2017Belle et al. 2017;Stoeckle et al. 2019;Zangl et al. 2020).
The issues outlined above, i.e. unresolved taxonomy and synonymous species names, challenging morphology, and possible hybridisation even in the wild, lead to frequently encountered problems in species identification, not only in weatherfishes (e.g., Meier et al. 2006;Steinke et al. 2009;Jones et al. 2013;Pyšek et al. 2013;Ryberg and Nilsson 2018). To exemplify the existing problems and difficulties that can be encountered in genetic species identification and management, we computed and assessed an updated phylogenetic tree of the genera Misgurnus and Paramisgurnus comprising 289 published weatherfish COI barcodes and two outgroups.

Materials and methods
A condensed phylogenetic tree of the genera Misgurnus and Paramisgurnus displaying the maximum likelihood estimates of phylogenetic relationship of COI-5-P mtDNA barcode sequences sourced on July 15, 2020, was computed. The databases GenBank/NCBI and BOLD version 4 ("Barcode of Life Data System", Ratnasingham and Hebert 2007) were searched using the phrases "Misgurnus OR Paramisgurnus" AND "COI" or "cox1" or "mitochondrial genome" in GenBank/NCBI, and "Misgurnus OR Paramisgurnus" in a public database query in BOLD. Subsequently, all duplicates of the resulting 361 sequences, and other mtDNA region barcodes available in BOLD, e.g. cytb, COII, COIII, or ND, were removed in Excel. For the phylogenetic analysis, the remaining sequences were aligned using the muscle algorithm implemented in MEGA X (Kumar et al. 2018) including published Pangia pangio (Hamilton, 1822) and Cobitis taenia Linnaeus, 1758 COI-sequences from India (Rahman et al. 2016) and Germany (Knebelsberger et al. 2015) as outgroups. The resulting alignment was trimmed to an overall length of 607 bp, removing all shorter barcode sequences, resulting in a total of 291 sequences. Maximum Likelihood (ML) method implemented in MEGA X was used to determine the best substitution model. The phylogenetic clustering of all sequences using the resulting best-fit model (HKY+G), and the maximum intra-and minimum interspecific uncorrected p-distances between the phylogenetic clusters representing different species or sub-groups were computed with the same software (see Supplementary material Table S1). The phylogenetic tree was subsequently condensed in collapsing each branch with less than 70% bootstrap support values obtained after 1000 replications. Figure 2 shows the condensed phylogenetic Maximum Likelihood tree of 289 available COI barcode sequences named either Misgurnus or Paramisgurnus plus two outgroups (total 291) with a minimum length of 607 bp. Overall, Figure 2. Condensed phylogenetic tree of the genera Misgurnus and Paramisgurnus displaying the Maximum Likelihood estimates of phylogenetic relationship of 291 COI-5-P mtDNA barcode sequences sourced in the public databases GenBank and BOLD on July 15, 2020. The numbers above branches display their respective bootstrap support values obtained after 1000 replications. Identical haplotype labels of the different species names, the associated tree branches and genetic clusters of the phylogenetic tree are indicated by different colours clockwise starting from the outgroups (given in black): blue = M. bipartitus; red = M. anguillicaudatus, split in "normal red" (group 1) and "dark red" (group 2), rose = Misgurnus sp.; violet = M. mohoity; green = Paramisgurnus dabryanus, split in "light green" (clade 1) and "dark green" (clade 2), yellow = M. fossilis. The encircled numbers 1 to 5 illustrate the discussed general examples of challenges and difficulties that can be encountered in invasive species assignment and molecular taxonomic identification, not just in the genera of weatherfishes. For instance, in cluster 1, the inclusion of M. anguillicaudatus is probably also due to morphological misidentification. Number 6 indicates the two sequences used as outgroups (left side Pangio pangia, right side Cobitis taenia). the phylogenetic clustering revealed seven distinct groups (M. bipartitus, M. anguillicaudatus group 1, M. anguillicaudatus group 2, Misgurnus sp., M. mohoity, P. dabryanus and M. fossilis), separated by minimum interspecific uncorrected p-distances between 6.3% (Misgurnus sp. GenBank accession numbers #JQ011433 and M. anguillicaudatus #MF122497, #KP112320) and 16.3% (all Misgurnus sp. and P. dabryanus #JQ011429, #KM610788 and #MN127938), proving the general validity of recognizing different clusters on species level within the group (Table S1; see also Yi et al. 2016bYi et al. , 2017. However, the clusters do not always reflect the currently valid species names as listed in Table 1. For instance, the three Vietnamese and the one Korean species were excluded from the analysis because no barcodes have been published, and only two barcodes are available for M. nikolskyi. Six of the eleven valid species names occur throughout the phylogenetic tree (M. anguillicaudatus, M. bipartitus, M. fossilis, M. mizolepis, M. mohoity, and M. nikolskyi). All this indicates an incomplete reference database (as of July 2020) for the universal mtDNA barcode or COI-sequences, as well as potential morphological misidentification. This is not surprising, and 55% species or species names represented by barcodes is actually a good result, as public barcoding databases coverage even in taxonomic well-known groups, such as European freshwater fishes, is on average only 66 to 88% of the known taxa (Weigand et al. 2019).

Results and discussion
Most surprising, each of the two distinct M. anguillicaudatus clusters separates from M. bipartitus with roughly the same minimum interspecific uncorrected p-distances (6.7% to 7.2%, as also reported by Yi et al. (2017)). We cannot confirm the proposed large interspecific distances between M. bipartitus and M. anguillicaudatus detected by Zangl et al. (2020). In contrast to our analyses, Zangl et al. (2020) included very few sequences named M. anguillicaudatus and excluded the ones from the native ranges published by Yi et al. (2017). This example illustrates the importance of a complete database sampling to avoid artificially increasing genetic differentiation between phylogenetic clusters due to incomplete coverage. The same problem also arises if there are only few divergent specimens available; this warrants more complete geographic sampling before drawing conclusions about species divergences.
In contrast to M. anguillicaudatus, only one monophyletic group is formed by M. fossilis (Figure 2). A potentially cryptic species identified to genus level, i.e. Misgurnus sp. from Hue province, Vietnam, also forms a distinct cluster (Figure 2, no. 2), but was only represented by four sequences. Other groups, e.g., M. mohoity and P. dabryanus, harbour large intraspecific genetic distances that may reflect true cryptic diversity due to the unresolved classical taxonomy, difficult morphological species identification, or simply a more exhaustive geographic sampling of genetically distinct sub-populations ( Figure 2, Table S1). Nevertheless, extensive haplotype-or label-sharing occurred, for which we cannot exclude true haplotype sharing due to incomplete linage sorting or introgression. Introgression and haplotype sharing were also detected by Perdices et al. (2012) using the cytochrome b gene (Cyt b) as the molecular mitochondrial marker. In our dataset, M. nikolskyi clusters completely in M. mohoity, rejecting a separate clade for this species, whereas the same holds true for M. mizolepis sequences that are distributed in many phylogenetic groups all over the tree. Owing to the large interspecific genetic distances, and in considering only one single molecular marker, and in line with Jakovlić et al. (2013), we cannot confirm the hypothesis that M. bipartitus is a synonym of M. mohoity as suggested by Fricke et al. (2020).
Overall, 12% (35 out of 289) of the analysed sequences are labelled with names not belonging to their respective phylogenetic clusters, and roughly twice the number of these sequences originate from introduced ranges versus native origin (23 vs. 12 sequences, see Table S1). This clearly illustrates the difficulties with morphological identification of specimens, even within the native ranges. The problem is further complicated by changing taxonomy or species names over time. The large discrepancy between native and introduced ranges may be due to the lack of taxonomic expertise or taxonomic literature in the introduced ranges. Whereas old (i.e., before 1920) weatherfish species descriptions and taxonomic literature are readily available digitized through the "Biodiversity Heritage Library" (Gwinn and Rinaldo 2009, https://www.biodiversitylibrary.org/, accessed August 23, 2020), it is difficult for the scientific community to obtain some recent taxonomic keys and literature. Many of the respective journals are not yet digitized or open access ("dark texts" sensu Page 2016), and some species descriptions and keys are not yet fully available in English (e.g., Kim and Park 1995;Nguyen and Bui 2009).
Many available sequences are labelled to a higher taxonomic level such as the genus (e.g., "Misgurnus", Figure 2). This may reflect difficulties in morphological species determination or the absence of a genetic species match at the time of analyses. It is important to keep in mind that molecular genetic databases like GenBank/NCBI and BOLD are constantly growing and changing, and that specimen assignments to taxonomic groups or phylogenetic clusters are dynamic. Thus, our analyses can also only provide a summary of the current picture on weatherfish phylogenetics and diversity, which may be outdated as soon as additional results become available. In our case, all data were downloaded on July 15, 2020, and all associated analyses can only reflect the knowledge of that time. This also illustrates that the indication of "misidentification" as made by Zangl et al. (2020) is due to the fact that Belle et al. (2017) conducted the analysis in September 2016 and correctly labelled the results according to the data available at that time. We agree that our COI sequences right now (July 15, 2020) cluster with COI sequences that are named M. bipartitus (Figure 2, no. 1). We also agree that there is published biogeographical, ecological, physiological and barcoding evidence that supports M. bipartitus at least as a distinct genetic sub-group or species (Yi et al. 2016b(Yi et al. , 2017(Yi et al. , 2018. The same issue applies to the sequences labelled Misgurnus sp. in our cladogram (Figure 2, no. 2). The specimens from which those COI sequences originate could neither be named nor assigned to any weatherfish species cluster using DNA barcodes at the time of analysis in 2011, as those sequences were the first ones published in the genera Misgurnus and Paramisgurnus (Table S1). The assignment in our phylogenetic tree and especially the distinct cluster formed by four of the sequences originating from Hue province, Vietnam (Figure 2, branches in rose), the type locality of M. multimaculatus Rendahl, 1944, underpins the cryptic diversity in exotic weatherfish species introduced to Australia already hypothesised by Kearns et al. (2011).
This controversy is not new (Lis et al. 2016;Page 2016;Steinke 2016), but at the same time should not prevent early sharing of data, which is particularly crucial when it comes to management of invasive species (Pergl et al. 2020). In this context, we want to stress the importance of resolving the classical taxonomy and completing genetic reference databases of not only weatherfishes, as the same problems apply to many other exotic fish species (Gomes et al. 2015;Dahruddin et al. 2017;Kundu et al. 2019).
Nevertheless, it might be possible that, in retro-perspective, available publications in the context of genetic weatherfish identification (Belle et al. 2017;Stoeckle et al. 2019;Zangl et al. 2020) are all missing the target with their genetic specimen assignments because, in reality, there may also be non-native hybrid weatherfish specimens which are difficult to identify. As outlined above, hybridisation even in the wild occurs between Far Eastern weatherfish species. Under laboratory conditions, hybridisation between the European endangered M. fossilis and exotic specimens was successful (Josef Wanzenböck, Research Department for Limnology Mondsee, University of Innsbruck, pers. comm. December 05, 2019). As expected from mitochondrial DNA in vertebrates, sequences derived from hybrid specimens unambiguously cluster within the respective maternal lines in the phylogenetic tree (Figure 2, no. 5). Hybrid specimens thus cannot be identified by relying solely on a mitochondrial marker like COI. As Zangl et al. (2020) and others implemented in their studies, additional nuclear molecular markers, for example the "recombination activating gene 1" (RAG1), should be analysed in addition to mtDNA markers, foremost from specimens, populations and species from the native ranges.

Conclusions
Overall, our analysis of genetic weatherfish identification using mtDNA barcodes confirms their still incomplete and unresolved classical and molecular taxonomy, as well as an incomplete coverage of species in public databases. Nonetheless, given the distinct existing phylogenetic clusters facilitating a genetic specimen assignment for the genera Misgurnus and Paramisgurnus, we suggest DNA barcoding and genotyping of detected non-native or traded weatherfish specimens. In any case, such analyses appear mandatory for source populations of any re-stocking in the context of conservation programs. A genetic assessment is also especially crucial to discover potential cryptic invasions of non-native genotypes or the presence of cryptic exotic species. For incomplete reference databases, the current situation also suggests clusters and differences that may change if taxon sampling is expanded. To tackle this problem, it might be sensible to expand "museomics" or barcoding projects to established non-native taxa to investigate their type specimens and type locations. This is also important as some introduced populations may become "extinct in the native ranges" and potentially could persist in the wild in introduced areas. However, an artificially driven distribution of individuals outside their native geographic range (e.g., "faunal enrichment") is not recommended. Second, hybridisation impedes classical taxonomic species identification in weatherfish and is still challenging if using molecular genetic methods such as barcoding. Therefore, we suggest expanding the use and standardising of nuclear markers. Finally, educating taxonomists and practitioners in the introduced ranges is important, as any gap of knowledge may facilitate cryptic invasions during which superficially similarly looking species and their impact on the introduced area might be overlooked for a long time. Therefore, it is important to keep up-to-date freely available identification literature accessible and comparable between biogeographic regions. We would additionally encourage the use of standard genetic analyses during ecological monitoring of known morphologically variable and difficult species that can easily be confused with similar-looking native ones. Such a situation is not only existent for weatherfishes, but also for other fishes such as bitterling (Rhodeus sp.), and freshwater mussels (Pieri et al. 2018;Bartáková et al. 2019;Kondakov et al. 2020).