Assessment of mitochondrial genomes for heterobranch gastropod phylogenetics

Heterobranchia is a diverse clade of marine, freshwater, and terrestrial gastropod molluscs. It includes such disparate taxa as nudibranchs, sea hares, bubble snails, pulmonate land snails and slugs, and a number of (mostly small-bodied) poorly known snails and slugs collectively referred to as the “lower heterobranchs”. Evolutionary relationships within Heterobranchia have been challenging to resolve and the group has been subject to frequent and significant taxonomic revision. Mitochondrial (mt) genomes can be a useful molecular marker for phylogenetics but, to date, sequences have been available for only a relatively small subset of Heterobranchia. To assess the utility of mitochondrial genomes for resolving evolutionary relationships within this clade, eleven new mt genomes were sequenced including representatives of several groups of “lower heterobranchs”. Maximum likelihood analyses of concatenated matrices of the thirteen protein coding genes found weak support for most higher-level relationships even after several taxa with extremely high rates of evolution were excluded. Bayesian inference with the CAT + GTR model resulted in a reconstruction that is much more consistent with the current understanding of heterobranch phylogeny. Notably, this analysis recovered Valvatoidea and Orbitestelloidea in a polytomy with a clade including all other heterobranchs, highlighting these taxa as important to understanding early heterobranch evolution. Also, dramatic gene rearrangements were detected within and between multiple clades. However, a single gene order is conserved across the majority of heterobranch clades. Analysis of mitochondrial genomes in a Bayesian framework with the site heterogeneous CAT + GTR model resulted in a topology largely consistent with the current understanding of heterobranch phylogeny. However, mitochondrial genomes appear to be too variable to serve as good phylogenetic markers for robustly resolving a number of deeper splits within this clade.


Background
Mitochondrial genomes are popular molecular markers for animal phylogenetics: they are relatively easy to sequence, assemble, and annotate, typically have a moderate level of sequence conservation that facilitates phylogenetic comparisons among relatively distantly related taxa, and can have gene order rearrangements that are potentially phylogenetically informative. Phylogenetic analyses of mitochondrial genomes have clarified relationships within diverse groups of invertebrates such as crustaceans, echinoderms, sponges, hemichordates, and annelids, just to name a few [1][2][3][4][5]. However, the application of mitochondrial genomes to phylogenetics can be limited by differences in evolutionary rates (which can lead to long-branch attraction (LBA) artifacts and incomplete lineage sorting [6]. Mitochondrial genomebased studies of molluscan evolutionary relationships have been variable in success. Molluscan mitochondrial genomes exhibit wide variation in size, organization, and rate of evolution [7][8][9]. Mitochondrial genomes have substantially aided in clarification of relationships within clades such as Caudofoveata [10], Cephalopoda [11,12], and some gastropod clades (e.g., [13][14][15]), but have had limited success at resolving relationships within other molluscan clades (e.g., [16]) and resolving deep molluscan phylogeny [17,18].
Phylogenetic analyses to date have been unable to robustly resolve most relationships among major heterobranch clades. However, most of these studies have been limited by taxon sampling. In particular, many of the "lower heterobranchs" were missing from most investigations of heterobranch phylogeny to date. These snails and slugs are thought to represent a critical group to understanding heterobranch evolution as it has been debated whether they form a clade that is sister to all remaining heterobranchs or a "basal" paraphyletic grade. Here, we sequenced mitochondrial genomes from 11 heterobranch taxa, including several so-called lower heterobranchs and select other understudied clades. These new data were analyzed in combination with publicly available heterobranch and outgroup mitochondrial genomes to investigate the utility of mitochondrial genomes for resolving higher-level heterobranch phylogeny, placement of the lower heterobranchs, and the evolution of heterobranch mitochondrial genome organization.

Genome assemblies and data matrix
We sequenced genomic DNA libraries from eleven species of heterobranch gastropods and extracted mitochondrial sequences (Table 1, Additional file 1: Table S1). Despite high sequencing coverage, a single contiguous mitochondrial genome was recovered for only two of the eleven taxa. All of the newly sequenced mt genomes were incomplete to some degree, possibly due to difficulties in sequencing through secondary structures associated with the 16S rDNA (which was absent from or incomplete in several of our assemblies) and the control region, but most of the mitochondrial protein-coding genes were obtained for all species. Alignments of amino acid sequences were produced for the thirteen protein-coding genes obtained from the newly sequenced taxa and publicly available heterobranch mt genomes on NCBI (Additional file 1: Table S1). After trimming each alignment to remove ambiguously aligned positions, the concatenated data matrix totaled 4735 amino acid sites with 31.3% gaps across 104 taxa (17 outgroups and 87 heterobranchs).

Maximum likelihood analyses
A partitioned maximum likelihood (ML) analysis of this data matrix using the best-fitting model for each gene (Additional file 2: Figure S1; see additional data on FigShare for more information) resulted in a tree with Valvata cristata (Valvatoidea) recovered as the sister taxon to a clade containing all other heterobranchs with successive branching of Microdiscula charopa (Orbitestelloidea), a clade composed of Clione limacina (Gymnosomata), Psilaxis radiatus (Architectonicoidea), Omalogyra atomus (Omalogyroidea), and Rissoella morrocayensis (Rissoelloidae), and then Rhopalocaulis grandidieri (Veronicelloidea), which was the sister taxon of all remaining Heterobranchia. All members of the clade containing C. limacina, P. radiatus, O. atomus, and R. morrocayensis exhibited extremely long branches relative to the other heterobranchs and it is well-established that Gymnosomata is nested within Euopisthobranchia. Thus, we strongly suspected that this clade was the result of long-branch attraction. This, combined with very low backbone support values, led us to re-run the analysis with the following unstable and long-branched taxa removed: C. limacina, P. radiatus, O. atomus, and R. morrocayensis (see Additional file 10: Table S3).
The matrix with unstable and long-branched taxa removed totaled 4447 amino acid sites with 28.7% gaps across 99 taxa. In the resulting partitioned ML analysis using the best-fitting model for each gene (Fig. 1), Valvata was again recovered as the sister taxon to a clade composed of all other heterobranchs (bootstrap support, bs = 100) followed by the successive branching of Microdiscula (bs = 62) and Rhopalocaulis (bs = 100). Most major clades of Heterobranchia (Euthyneura sensu lato) were recovered with high support: Acteonoidea, Nudipleura, Cephalaspidea, Runcinida, Aplysiida, Siphonariida, Sacoglossa, and Stylommatophora were all recovered with 100% bootstrap support, and Systellommatophora with 99% bootstrap support. Of the family-and order-level taxa, Ellobioidea is the only one that was recovered as non-monophyletic, with Pedipes pedipes and Myosotella myosotis falling well outside of the clade containing the rest of Ellobioidea, albeit with very low support. Support for relationships among most higher-level heterobranch clades was generally weak and a number of higher-level groupings within Heterobranchia including Tectipleura, Euopisthobranchia, Panpulmonata, Eupulmonata, Systellommatophora, and Amphipulmonata (sensu [20]) were not monophyletic. However, Nudipleura (Nudibranchia + Pleurobranchomorpha) and a clade composed of Aplysiida + Umbraculoidea were recovered monophyletic with maximal support. To explore the impact of different partitioning schemes on tree topology, and to determine whether other models better mitigated the long-branch attraction of C. limacina, P. radiatus, O. atomus, and R. morrocayensis, we ran several additional analyses. A partitioned analysis with a mixed model (LG + C60 + G + F) yielded a tree (Additional file 3: Figure S2) with the same clade of longbranched taxa as sister to all remaining heterobranchs except Valvata and Microdiscula and did not vary significantly in any other way from the original ML tree (Additional file 2: Figure S1). A ML analysis with Lanfear clustering (Additional file 4: Figure S3) and a fully partitioned ML analysis with resampling within partitions (Additional file 5: Figure S4) both produced similar trees to the initial partitioned ML analysis (Additional file 2: Figure S1), but with the two members of Ellobioidea that did not form a clade with the rest of Ellobioidea (Myosotella and Pedipes) falling outside Hygrophila and thus even farther from the remaining ellobioids. An analysis with an edge-unlinked model altered the relationships within the long-branched clade, with O. atomus and P. radiatus as sister to R. morrocayensis and C. limacina, and, while in previous analyses R. morrocayensis had a much longer branch than the other taxa in this clade, in the edge-unlinked tree, all four taxa had similarly elongated branches (Additional file 6: Figure S5). In this edgeunlinked analysis, the positions of Hygrophila and the pair of ellobiods were recovered with relationships similar to those of the initial partitioned ML tree (Additional file 2: Figure S1). We also analyzed the set of all taxa except C. limacina to assess whether removal of this single rapidly-evolving taxon would 'release' the other longbranched taxa, which are traditionally considered to be lower heterobranchs, from this putative LBA artifact. The other three long-branched taxa remained in the same location with long branches (Additional file 7: Figure S6).

Bayesian inference
Because of poor support for most nodes of interest in our maximum likelihood analyses, we also performed a Bayesian inference with the CAT + GTR + G4 model on the same datasets, but only the analysis of the dataset with unstable and long-branched taxa removed reached convergence. Of the six chains that were run for over 60,000 generations, four converged according to the Phy-loBayes bpcomp maxdiff criterion (maxdiff value = 0.29), yielding a tree with a topology that is much more consistent with the current understanding of heterobranch relationships (Fig. 2). Valvata and Microdiscula were recovered in a polytomy with a clade that comprised all other heterobranchs, which received maximal support. This clade consisted of a polytomy of Nudipleura + Acteonoidea, Ringicula, and the remaining heterobranchs. Nudipleura + Acteonoidea was weakly supported but Nudipleura again received maximal support.
Within Panpulmonata, Siphonariida was recovered as the sister taxon to the rest of the clade (pp = 1) with (See figure on next page.) Fig. 1 Maximum likelihood phylogeny of heterobranch gastropods based on the reduced set of taxa following removal of both unstable leaves flagged by RogueNaRok and the four longest-branched taxa. Taxa for which new sequences were collected are shown in bold. The data set was trimmed with TrimAL with default settings, partitioned by gene in RAxML, and the PROTGAMMAAUTO was used to select the best-fitting model for each partition. Bootstrap support values are presented at each node Varney et al. BMC   Sacoglossa sister to all other taxa within that clade (pp = 0.96). The remaining panpulmonates formed two clades. One consisted of Stylommatophora, Systellommatophora, and Ellobioidea, although neither Systellommatophora nor Ellobioidea were recovered monophyletic. Rhopalocaulis (Systellommatophora) was recovered as the sister taxon of Stylommatophora (pp = 0.71) and this clade was recovered in a polytomy with the ellobioids Pedipes and Myosotella that was maximally supported. Sister to this polytomy was a moderately well-supported clade (pp = 0.98) in which the remainder of Systellommatophora (Onchidiidae) was sister to the remaining Ellobioidea. Sister to the Stylommatophora-Systellommatophora-Ellobioidea clade was a clade comprising Hygrophila (pp = 1.00), Pyramidella dolabrata (Pyramidelloidea), Salinator rhamphidia (Amphiboloidea), and Acochlidium fijiense (Acochlidia). Salinator and Pyramidella formed a well-supported clade (pp = 0.99) but otherwise, higher-level relationships in the Hygrophila-Pyramidelloidea-Amphiboloidea-Acochlidia clade were weakly supported.

Mitochondrial gene order evolution
A somewhat diagnostic gene arrangement exists for heterobranchs relative to other gastropod clades, but many heterobranch taxa and subclades have differences in both gene organization and orientation in their mitochondrial genomes (Fig. 3). Caenogastropods encode all mitochondrial genes in the same orientation, while all members of the clade comprising Neritimorpha and Vetigastropoda share a single inversion of [12S rRNA, 16S rRNA, nad1, nad6, cytB, nad4L, nad4, nad5]. Across the diverse taxa used as outgroups in this study, no individual deviations in gene arrangement were found. In contrast to this consistency, the taxa at the base of our heterobranch tree all have remarkably different mitochondrial gene arrangements from one another. Mitochondrial gene order within most of the "lower Heterobranchia" is variable: Psilaxis radiatus (Architectonicoidea), Omalogyra atomus (Omalogyridae), Rissoella morrocayensis (Rissoellidae), and Valvata cf. cristata (Valvatidae) all have distinct gene orders from one another and from remaining heterobranchs, including changes in both order and orientation. Microdiscula charopa also has an entirely unique gene order.
In the remaining heterobranchs, the clade spanning Acteonoidea, Nudipleura, and the subclade including Runcinida, Cephalaspidea, Umbraculoidea, Aplysiida, Siphonariida, Sacoglossa, Amphiboloidea, Pyramidellidae, Hygrophila, Acochlidia, Systellommatophora, Ellobioidea, and Stylommatophora, a relatively stable mitochondrial gene order and orientation exists, referred to here as the common heterobranch gene order [cox1-16SrRNA-nad6-nad5-nad1-nad4L-ctyB-cox2-atp8-atp6-12SrRNA-nad3-nad4-cox3-nad2, with atp8-nad3 and cox3 both reversed in direction from cox1]. All members of Nudipleura examined adhere to this common gene order except Hypselodoris festiva, in which a single gene (nad4) changed position, and all members of Acteonoidea adhere to the same order as well. Variation exists within the Cephalaspidea, with a shared rearrangement of cytB, nad1, nad4L, and cox2 shared among three-fourths of its members, and Sagaminopteron nigropunctatum containing further rearrangements. Aplysiida adheres to the stable arrangement with the exception of Aplysia kurodai, in which the orientation of the 12S rRNA gene is reversed though its position remains the same.
Both representatives of Siphonariida have different internal mitochondrial gene rearrangements: Siphonaria gigas reversed the positions of nad4 and nad3, while Siphonaria pectinata inserts cox2 between nad4L and cytB. All sacoglossans shared a common gene order, as do Pyramidella dolabrata and Acochlidium fijiensis. The majority of Ellobioidea are consistent, excepting Myosotella myosotis and Pedipes pedipes, which have different single-gene transpositions than one another. Additionally, the mt genome of P. pedipes is expanded, with more intergenic space than other closely related taxa. Interestingly, these two taxa are those that fall together in a different part of the phylogeny than the remaining members of Ellobioidea, making this group paraphyletic. The clade comprising Hygrophila was strongly supported, and all members within it share the common heterobranch gene order except Physella acuta, which has a completely unique gene arrangement.
Within Stylommatophora, both members of genus Achatinella shared a single gene (cox2) moved to a different position relative to other members of the clade. Likewise, the taxa Cylindrus obtusus, Cepaea nemoralis, and Cornu aspersum (syn. Helix aspersa) all share a single gene (nad4) inserted at a different location in the

Heterobranch phylogeny
We sequenced mitochondrial genomes from eleven heterobranch gastropods and investigated heterobranch evolutionary relationships using amino acid sequences from the thirteen mitochondrial protein-coding genes as well as the evolution of heterobranch mitochondrial genome organization. Mitochondrial genomes can be useful in molecular phylogenetics because of the functional constraint that should, in theory, lead to a relatively high degree of conservation across evolutionary time. This has been demonstrated in diverse groups of animals where mitochondrial genomes have served as useful markers for molecular phylogenetics [1][2][3]5]. However, in other animal lineages, it has been demonstrated that mitochondrial genome evolutionary rate is too rapid to recover ancient radiations (e.g., [8,17,33]).
Our maximum likelihood-based analysis including all taxa failed to recover most recognized higher-level heterobranch clades but did recover a maximally supported clade of taxa with extremely long branches near the base of Heterobranchia. This clade includes taxa known to have brief lifespans, some of only a few months, which may correlate with a more rapid accumulation of genetic changes [34]. To combat this putative artifact of longbranch attraction, ML analyses of a dataset with longbranched and unstable taxa excluded were performed. Excluding these taxa resulted in a tree that exhibited an apparent mis-rooting within Heterobranchia relative to other studies (e.g., [22,31,32], reviewed by [19]) with Panpulmonata paraphyletic with respect to a clade of opisthobranchs. Support for most higher-level heterobranch clades was weak in both maximum likelihood analyses, although most order-level taxa were recovered monophyletic with strong support.   Fig. 3 Presumed ancestral mitochondrial gene order based on a TreeRex analysis of each major clade of heterobranch gastropods. Grey boxes spanning multiple clades indicate the common Heterobranch gene order shared among most taxa. Empty white boxes represent missing genes from sequenced mitochondrial contigs. Tree topology is taken from the BI tree presented in Fig. 2 Clear long-branch attraction and weak support for deep relationships within Heterobranchia initially led us to conclude that mitochondrial genomes have little to no phylogenetic signal for deep nodes within Heterobranchia. Additional maximum likelihood analyses that attempted to account for differences in evolutionary rates between genes did not improve resolution of these deeper nodes. However, although mostly weakly supported, a number of previously hypothesized relationships were recovered in all of our maximum likelihood analyses. These include Euthyneura (e.g. all heterobranchs except Valvatoidea and Orbitestelloidea), Pyramidelloidea + Amphiboloidea, Nudipleura + Ringicula (Ringipleura), Ringipleura + Acteonoidea (not considering Rissoelloidae), Aplysiida + Umbraculoidea (not considering Gymnosomata and Thecosomata), and Cephalaspidea + Runcinida [22,31,32,[35][36][37][38].
In order to determine if selecting a model that better accounts for site-specific rate heterogeneity could help improve resolution, we conducted a Bayesian inference using the site heterogeneous CAT + GTR + G4 model. This analysis resulted in a topology that is much more consistent with other studies examining heterobranch evolutionary relationships to date. Again, we recovered Euthyneura to the exclusion of Valvatoidea and Orbitestelloidea with maximal support. Whereas our maximum likelihood analyses recovered Valvatoidea as the sister taxon to all other heterobranchs with moderate to weak support, our Bayesian inference recovered these two "lower heterobranchs" in a polytomy with the rest of Heterobranchia. This is in concordance with previous Sanger-sequencing based approaches [20,29] but neither clade was so far sampled by phylogenomics [35] or mitogenomics [15]. Valvatoidea (= Ectobranchia) is a group of minute freshwater and marine snails with discoidal to ovoid shells. Haszprunar et al. regarded Valvatoidea as the earliest-branching heterobranch clade based on their broad, rhipidoglossate radula and unusual ectobranch gill [39]. This phylogenetic position was favored by Brenzinger et al. because a clade of all heterobranchs except Valvatoidea is supported by the presence of ciliated strips in the mantle cavity [40]. Sperm ultrastrucure is also consistent with their placement among the lower heterobranchs [41]. Orbitestelloidea was considered to belong to Valvatoidea in the past. Our Bayesian analysis produced a polytomy containing these taxa, but all our maximum likelihood analyses separated these taxa from one another with Valvatoidea sister to all other heterobranchs, as consistent with the most recent classification [27]. The fossil record also coincides with a greater age of "lower" heterobranchs (possibly present in mid-Paleozoic) vs. Euthyneura (diversifying in the Jurassic) [42][43][44], although unequivocal fossils of Valvatoidea and Orbitestelloidea-with minute, fragile and often inconspicuous shells-are much younger (Cretaceous to Eocene) (see [42,43]). Architectonicoidea is another candidate for an old group judging from the presence of fossils in the Triassic [45]. Unfortunately, most of the other lower heterobranchs we sampled-O. atomus (Omalogyroidea), Psilaxis radiatus (Architectonicoidea), and Rissoella morrocayensis (Rissoelloidae)-exhibited extremely long branches and the Bayesian inference including these taxa (as well as an analysis including these taxa but excluding C. limacina; data not shown) failed to converge.
Our Bayesian inference recovered Pyramidelloidea + Amphiboloidea and Aplysiida + Umbraculoidea with strong support (pp ≥ 0.98). This analysis also recovered a number of other heterobranch clades that have been identified in other studies but were not recovered in the maximum likelihood analysis of this dataset. Most notably among these is Panpulmonata. We recovered Siphonariida as the sister taxon of the remaining panpulmonates followed by Sacoglossa as the sister taxon to all other panpulmonates after that, all with strong support (pp ≥ 0.99). Interestingly, support for the relative placement of these two clades has been weak in most studies with the relevant taxon sampling to date (but see [31]). Our results are inconsistent with most studies to date, which have recovered these two taxa as a clade [22,38] or with Sacoglossa, not Siphonariida, sister to the rest of Panpulmonata [31,32], reviewed by [20,46].
Although Ringicula was previously recovered as the sister taxon of Nudipleura [32], which we recovered here in our maximum likelihood analyses, this relationship was not supported in our Bayesian inference. Instead, Nudipleura was recovered as the sister taxon of Acteonoidea, but this clade was weakly supported. Ringicula was recovered in a polytomy with this weakly supported Nudipleura-Acteonoidea clade (and a strongly supported clade consisting of all other heterobranchs), so although we find no support for the Ringipleura hypothesis in this analysis, our Bayesian inference results are not incompatible with Ringipleura.
All of our analyses failed to recover Ellobioidea as a monophyletic group. A previous analysis that included some of the Ellobioidea mitochondrial genomes analyzed herein, including those of the two taxa that were recovered separately from the rest of Ellobioidea in our analyses (Pedipes pedipes and Myosotella myosotis), also failed to recover a monophyletic Ellobioidea [47]. However, Dayrat [48,49].

Evolution of heterobranch mitochondrial gene organization
Long-branch attraction, as discussed above, is likely responsible for the recovery of C. limacina in a clade with the "lower heterobranchs" O. atomus, P. radiatus, and R. morrocayensis. Often there is a correlation between a high rate of genome evolution and genome rearrangements [50]; O. atomus and members of the genus Rissoella are known to have short life cycles [20,51]. C. limacina has a completely unique gene order, so it is possible that some sequence differences may be a result of rearrangement and these in turn contributed to the misplacement of this taxon. Within Ellobioidea, the two members that are consistently recovered apart from the rest (Myosotella myosotis and Pedipes pedipes) both contain single gene transpositions (though differing from one another).
However, this correlation does not hold for other isolated members of clades with extreme gene order rearrangements. For example, Physella acuta has mitochondrial gene reordering so extensive that a minimum of five independent gene rearrangements are necessary to account for the difference between it and the remaining members of Hygrophila [52]. Despite this, P. acuta still forms a clade with the rest of Hygrophila with very high support in all analyses. Likewise, Sagaminopteron nigropunctatus forms a clade with the other cephalaspids with very high support despite differing dramatically in gene arrangement from the other three members included in the analysis, and the relationships among these taxa are supported by recent transcriptome-based analyses [53]. The variable relationship of evolutionary rate of gene sequences and mitochondrial gene rearrangement could be interesting to investigate in future studies.
The shared gene arrangement among the majority of heterobranch taxa suggests this common gene order emerged prior to the common ancestor of Nudipleura and remaining taxa. However, most taxa previously identified as "lower heterobranchs, " as well as the additional taxa recovered at the base of the heterobranch tree in our analyses, have unique mitochondrial gene arrangements relative to all other gastropods. The differences among these taxa and between them and the main clade of heterobranchs cannot be explained with stepwise changes, but instead suggest multiple independent inversions and transpositions and may be due to a combination of long evolutionary trajectories (since the mid-Paleozoic) [54] and derived ecologies and lifestyles in many subgroups, including the commonly observed abbreviation and modification of life cycles by multiple evolution of paedomorphic groups. Our results indicate that mitochondrial genome organization started to deviate considerably from the ancestral molluscan arrangement first at the origin of Heterobranchia and later, even more so, at the origin of Euthyneura.

Conclusions
Here, we sequenced 11 new heterobranch mitochondrial genomes including several "lower heterobranchs". These new data were analyzed in combination with publicly available heterobranch and outgroup mitochondrial genomes using maximum likelihood and Bayesian inference. Results of maximum likelihood analyses with site-homogeneous models indicated that even with the exclusion of exceptionally rapidly evolving taxa, mitochondrial genomes have limited utility for resolving most higher-level heterobranch relationships. However, Bayesian inference using the site-heterogeneous CAT + GTR + G model recovered most recognized higher-level heterobranch clades including Tectipleura, Euopisthobranchia, and Panpulmonata. Unfortunately, most of the lower heterobranch taxa that we aimed to place in a phylogenetic context exhibited extremely fast rates of evolution. Relationships within most heterobranch order-level clades that were broadly sampled (e.g., Nudipleura, Aplysiida, Sacoglossa, Hygrophila, Stylommatophora) were well-resolved and strongly supported. Despite the relatively rapid rate of nucleotide evolution in heterobranch mitochondrial genomes, gene order was found to be largely conserved across the group. Taken together, these results provide support for several hypothesized heterobranch clades and highlight the non-euthyneuran clades Valvata and Orbitestelloidea as interesting and important taxa to study with respect to understanding early heterobranch evolution. However, a lack of resolution and poor support for a number of deeper nodes within Heterobranchia highlight limitations of mitogenomic data for deep phylogeny, especially for rapidly evolving taxa like the long-branched lower heterobranchs, and reveals the surprising degree of heterogeneity within even closely related molluscan taxa that may in part be responsible for these limitations.

DNA extraction, library preparation, and sequencing
DNA was extracted from specimens obtained from various sources (Table 1)

Assembly and annotation
De novo assemblies of reads were initially carried out with Spades 3.14.0 [55]. In the case of O. atomus, the longest mitochondrial genome contig produced by Spades was missing several mitochondrial genes and Ray 2.2.0 [56] was used for assembly. Mitochondrial genomes were identified by creating a BLAST database from each set of assembled scaffolds and querying that database with the complete mitochondrion of Galba pervia (NCBI NC_018536.1) via BLASTN with an e-value cutoff of 1e-4. The longest BLAST hits were annotated with the MITOS2 web server with default parameters and the invertebrate mitochondrial genetic code (5) [57].

Data set construction
Coding sequences of the 13 mitochondrial proteincoding genes (cox1, cox2, cox3, atp6, atp8, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, cob) were extracted from the newly sequenced and annotated mitochondrial genomes, as well as those publicly available on NCBI (see Additional file 1: Table S1). Single-gene alignments were simultaneously produced for DNA and amino acid sequences with MACSE v1.2 [7] using the invertebrate genetic code (5). Alignments were trimmed with trimAL with default settings [58]. Trimmed alignments were checked manually in MEGA 10.0.4 [59] and corrected by hand if translations were initially out of frame. FAScon-CAT [60] was used to assemble the concatenated amino acid supermatrix file. In response to difficulties with long-branched taxa (Additional file 2: Figure S1) and in keeping with recent recommendations to improve phylogenetic analysis [61], the alignment was also trimmed with BMGE [62] trimming with default settings. The BMGE matrix was used for subsequent analyses. All data matrices are available online via FigShare.

Maximum likelihood analyses
An initial maximum likelihood analysis (Supplemental Figure S1) was conducted on the initial TrimAL-trimmed (with default settings), partitioned-by-gene supermatrix using RAxML v8.2.4 [63] with the PROTGAMMAUTO model, which automatically selects the best-fitting model for each partition, rapid bootstrapping, and selection of the best-scoring maximum likelihood tree in one run. The number of bootstrap replicates was determined by the majority-rule consensus criterion (autoMRE).
Leaf stability was assessed with RogueNaRok [64] using the majority rule consensus criterion. Four taxa (R. grandideri, P. pedipes, P. acuta, and M. myosotis) had a leaf stability difference of < 0.75 and were considered to be unstable by RogueNaRok (Additional file 8: Table S2). These taxa, along with the very long-branched taxa C. limacina, P. radiatus, O. atomus, and R. morrocayensis were removed and the remaining sequences for each gene were re-aligned, trimmed, and concatenated before reanalysis in RAxML as described above.
To attempt to combat the apparent long branch attraction among C. limacina, P. radiatus, O. atomus, and R. morrocayensis, trees were also produced for the BMGEtrimmed matrix with a number of different models and/or analysis settings. We performed a series of ML analyses in IQ-TREE 2 [65] with 1000 rapid bootstraps employing different models and partitioning schemes including (1) the BMGE-trimmed dataset partitioned by gene with a partitioned mixed model (LG + C60 + G + F) and the best tree from RAxML provided as a starting tree (Additional file 3: Figure S2); (2) the same BMGEtrimmed dataset partitioned by gene and using Lanfear clustering to select optimal partitioning [66], resulting in 5 partitions with different models (Additional file 4: Figure S3); (3) a fully partitioned analysis of this matrix where PartitionFinder independently selected the best model for each gene with the -GENESITE correction to resample partitions and then sites within partitions [67] (Additional file 5: Figure S4); (4) an analysis of this matrix with an edge-unlinked model to better account for heterotachy (GHOST) [68] (Additional file 6: Figure S5). We also ran a RAxML analysis on the original TrimALtrimmed dataset but with C. limacina removed (Additional file 7: Figure S6. In all IQ-TREE 2 and RAxML trees, the clade of four (or three, in the last analysis) longbranched taxa persisted, and the overall tree topology did not change.

Bayesian analysis
Bayesian trees were generated with PhyloBayes-MPI v1.6 [69] with four chains and the CAT + GTR + Γ4 substitution model. Two analyses were attempted based on the BMGE-trimmed matrix: (1) an analysis sampling all taxa; and (2) an analysis excluding the taxa flagged as unstable in the initial maximum likelihood tree (C. limacina, P. radiatus, O. atomus, and R. morrocayensis).