Assessing the utility of transcriptome data for inferring phylogenetic relationships among coleoid cephalopods
Graphical abstract
Introduction
The molluscan class Cephalopoda contains some of the most charismatic invertebrates on Earth, and yet, many questions about their evolutionary history remain. The approximately 900 species of extant nautiloids, octopods, bobtail squids, cuttlefishes and squids comprise a group defined by a high degree of morphological diversity, rapid radiation and a poor fossil record for many taxa, all of which make inferring their phylogenetic history challenging. Two major factors may have influenced radiation and diversification of extant cephalopod taxa: the extinction of the ammonites and belemnoids at ∼66 mya, which may have opened up new niches, and the radiation of bony fishes, which are direct competitors with, prey of and predators on cephalopods (Aronson, 1991). Several cephalopod clades likely have undergone major Cenozoic radiations, including Oegopsida (∼250 sp., most oceanic ‘open-eye’ squids), Octopodidae (∼150 sp., benthic octopods) and Sepiida (∼100 sp., cuttlefishes), while other lineages such as Vampyroteuthis infernalis appear to have remained relatively unchanged. The timing and tempo of these radiations are difficult to assess due to weak fossil data for many lineages and an uncertain phylogeny for deep nodes (see Allcock et al., 2014 for details), even though significant methodological advances have been made (Rabosky et al., 2013, Stadler, 2011). Internal factors that can affect diversification rates such as genome duplication events have been identified in vertebrates (Jaillon et al., 2004) where duplicated genes likely led to new functions, such as osmoregulation in salmonids (Norman et al., 2012). Although less well studied, evidence for one or more genome duplication events in cephalopods exists (Hallinan and Lindberg, 2011). Lastly, extant cephalopods have undergone several habitat transitions that likely influenced diversification rate and character evolution (e.g., Kröger et al., 2011, Strugnell et al., 2006).
Over the last century, researchers have utilized a variety of approaches to study phylogenetic relationships within Cephalopoda, with limited success. Despite extensive work using morphological data, traditional multi-gene Sanger sequencing techniques or whole mitochondrial genomes, a good understanding of cephalopod ordinal relationships remains elusive (Allcock et al., 2011, Lindgren, 2010, Young and Vecchione, 1996). The largest molecular phylogenetic study in terms of taxon sampling incorporated publicly available data from six nuclear and four mitochondrial loci for over 400 OTUs to test hypotheses of convergent evolution and for correlation between morphology and habitat, providing new insight and support for some of the major subclades (Lindgren et al., 2012). At present (see Allcock et al., 2014 for a summary), clades that have been largely robust to differences in taxon sampling, data and/or phylogenetic method include Octopodiformes (all octopods and Vampyroteuthis infernalis), Incirrata (all octopods lacking fins), Cirrata (finned octopods) and Decapodiformes (open-eye squids, bobtail squids, pygmy squids, loliginids, cuttlefishes and Spirula spirula, the ram’s horn squid).
The problem of poor support and/or inconsistent resolution is best exemplified in Decapodiformes, the major clade containing the orders Oegopsida (most oceanic “open eye” squids), Bathyteuthoidea (comb-finned squids and their relatives), Idiosepiida (pygmy squids), Sepiida (cuttlefishes), Sepiolida (bobtail squids), Spirulida (Spirula spirula) and Myopsida (comprising Loliginidae—a family of mostly large-bodied, muscular, neritic squid, many of major fisheries importance—and Australiteuthidae, a poorly known group of small squid; Lu, 2005). Little progress on resolving relationships among these lineages has been made since the morphological research of Naef (1923). He proposed that extant Decapodiformes should be subdivided into two groups: Sepioidea (containing Idiosepiida, Sepiida, Sepiolida and Spirulida) and Teuthoidea (Myopsida and Oegopsida). However, Naef struggled with the position of Myopsida, due to shared characteristics with both Sepioidea and Oegopsida. Berthold and Engeser (1987) partially supported Naef’s hypothesis, but suggested that Spirulida was a sister taxon to sepioids+loliginids, a group they termed “Uniductia.” More recently, no molecular study to date has found support for Sepioidea or Teuthoidea sensu Naef, and the position of Myopsida varies significantly depending on analytical method, data and taxon sampling (Allcock et al., 2014). In general, decapodiform relationships vary with differences in taxon sampling, type of genetic data used and analytical method employed (e.g., Carlini and Graves, 1999, Lindgren, 2010, Lindgren et al., 2012, Strugnell et al., 2005, Strugnell and Nishiguchi, 2007) and the issue of how the sepioids and loliginids are related to each other and to Oegopsida remains contentious (Allcock et al., 2014).
Next-generation sequencing (NGS) techniques have shown a high degree of success in phylogeny estimation for a variety of taxonomic groups, including mollusks (Kocot et al., 2011, Smith et al., 2011). Some genome/transcriptome-scale studies have included representatives of multiple cephalopod lineages (Kocot et al., 2011, Smith et al., 2011), but these studies were focused on molluscan phylogeny and lacked representatives of major cephalopod lineages (e.g., Oegopsida, Sepiida and Vampyromorpha). Similarly, Albertin et al. (2015) used genome and transcriptome data to infer the phylogenetic position of Octopus bimaculoides within Mollusca, but key questions in coleoid cephalopod phylogeny could not be addressed because Nautiloidea, Oegopsida and Vampyromorpha were not sampled. Recently, a study by Strugnell et al. (2017) utilized mitochondrial genome data from two new taxa, Spirula spirula and Sepiadarium austrinum, to evaluate higher-level relationships, finding support for a close association between Spirulida, Bathyteuthoidea and Oegopsida (a finding also supported by a multigene phylogeny; Lindgren et al., 2012) and provided new hypotheses regarding the placement of Idiosepiida, Sepiida, Sepiolida and Myopsida (Strugnell et al., 2017). Another recent phylogenetic study that included all 39 previously published cephalopod mitochondrial genomes, plus data for four to five mitochondrial protein-coding genes from Spirula and four octopods (including the cirrate Opisthoteuthis massyae) (Uribe and Zardoya, 2017) was published shortly after Strugnell et al. (2017). Finally, while the present paper was under review, a study by Tanner et al. (2017) was published in which the authors tested hypotheses of cephalopod relationships using a combination of NGS data, including a transcriptome for the cirrate octopod Grimpoteuthis glacialis (now Cirroctopus glacialis—Collins and Villanueva, 2006, O’Shea, 1999—but we retain the usage of Tanner et al. for clarity) and a small amount of shotgun genome sequence data for Spirula spirula.
The present study aims to incorporate new and published transcriptome data to test the sensitivity and utility of large-scale datasets for inferring relationships among cephalopod lineages, particularly within Decapodiformes. Here, we evaluate the utility of NGS data for cephalopod phylogeny by generating datasets using a new cephalopod core ortholog assignment pipeline. Additionally, we employed several filtering steps and analytical approaches to assess the sensitivity of transcriptome data to impacts such as missing data and compositional and rate heterogeneity artifacts.
Section snippets
Taxon sampling
For our initial analyses, all publicly available cephalopod transcriptome data as of 9 February 2016 (47 total) were downloaded from the Sequence Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra/) and the GenBank EST database (https://www.ncbi.nlm.nih.gov/nucest) as fastq or fasta files. Predicted proteins from the Octopus bimaculoides genome (Albertin et al., 2015) were also downloaded from https://www.ncbi.nlm.nih.gov/genome/41501.
Novel transcriptome data were also generated for
Transcriptomes, datasets and matrices
Novel transcriptomes from four cephalopods (Table 1) were used in this study; these data are available from the NCBI Sequence Read Archive under BioProject accession number SRP119608. The initial 30 transcriptomes incorporated into this study were highly variable in terms of number of contigs assembled by Trinity and the number of orthologs recovered by HaMStR (Table 1). Several transcriptomes contained contigs representing ∼2000 orthologs in the cephalopod core ortholog set (e.g., Doryteuthis
The utility of phylotranscriptomics for cephalopod phylogenetics
Regardless of the dataset construction method, the degree of missing data or filtering method, several uncontroversial relationships were consistently recovered in this study: Decapodiformes, Octopodiformes and all families for which multiple members were included all formed well-supported clades. Surprisingly, when we took several of the transcriptomes used in our best1 dataset and incorporated data for several additional species used by Tanner et al. and not previously available to us (
Conclusion
While genomic data are known to be useful for inferring deep-level relationships in many cases, such as for Mollusca (e.g., Kocot et al., 2011, Smith et al., 2011), a strong phylogenetic signal must be present. Extant cephalopods (particularly Decapodiformes) seem to be the product of ancient rapid radiations, which may confound our ability to resolve deep-node relationships, even in the presence of a large number of loci. In these cases, taxon sampling (e.g., Pyron, 2015), orthology inference
Availability of data and material
The datasets generated during and analyzed in this study, custom scripts used in the phylogenomics pipeline, the cephalopod core ortholog set for use in HaMStR and phylogenetic trees are available in the Dryad repository (Supplementary Material).
Competing interests
We have no competing interests.
Authors’ contributions
ARL and FEA designed the study; ARL collected the new transcriptome data; FEA implemented the bioinformatics pipelines; ARL and FEA performed the phylogenetic analyses; ARL and FEA wrote and edited the manuscript.
Acknowledgement
We are grateful to Dick Young, Michael Vecchione and two anonymous reviewers for providing feedback on this manuscript. Thanks also to Alistair Tanner and Rute de Fonseca who provided us with access to their assemblies as well as guidance on how their dataset was constructed. All new transcriptome sequence data were obtained in the laboratory of Todd Oakley at UCSB. Thanks also to Sabrina Pankey who collected the Vampyroteuthis infernalis and Todarodes pacificus specimens used for this study.
References (82)
- et al.
Basic local alignment search tool
J. Mol. Biol.
(1990) Phylogeny and historical biogeography of the loliginid squids (Mollusca: cephalopoda) based on mitochondrial DNA sequence data
Mol. Phylogenet. Evol.
(2000)Phylogenetic relationships among loliginid squids (Cephalopoda: Myopsida) based on analyses of multiple data sets
Zool. J. Linn. Soc.
(2000)- et al.
Should we be worried about long-branch attraction in real data sets? Investigations using metazoan 18S rDNA
Mol. Phylogenet. Evol.
(2004) - et al.
The supermatrix approach to systematics
Trends Ecol. Evol.
(2007) - et al.
FASconCAT: convenient handling of data matrices
Mol. Phylogenet. Evol.
(2010) - et al.
BaCoCa–a heuristic software tool for the parallel assessment of sequence biases in hundreds of gene and taxon partitions
Mol. Phylogenet. Evol.
(2014) Molecular inference of phylogenetic relationships among Decapodiformes (Mollusca: Cephalopoda) with special focus on the squid order Oegopsida
Mol. Phylogenet. Evol.
(2010)Post-molecular systematics and the future of phylogenetics
Trends Ecol. Evol.
(2015)- et al.
A large and consistent phylogenomic dataset supports sponges as the sister group to all other animals
Curr. Biol.
(2017)
Molecular phylogeny of coleoid cephalopods (Mollusca: Cephalopoda) using a multigene approach; the effect of data partitioning on resolving phylogenies in a Bayesian framework
Mol. Phylogenet. Evol.
Whole mitochondrial genome of the Ram’s Horn Squid shines light on the phylogenetic position of the monotypic order Spirulida (Haeckel, 1896)
Mol. Phylogenet. Evol.
Deciphering ancient rapid radiations
Trends Ecol. Evol.
The octopus genome and the evolution of cephalopod neural and morphological novelties
Nature
Lol-i-go and far away: a consideration of the establishment of the species designation Loligo pealei
What can the mitochondrial genome reveal about higher-level phylogeny of the molluscan class Cephalopoda?
Zool. J. Linn. Soc.
The contribution of molecular data to our understanding of cephalopod evolution and systematics: a review
J. Nat. Hist.
Lights out: the evolution of bacterial bioluminescence in Loliginidae
Hydrobiologia
Ecology, paleobiology and evolutionary constraint in the octopus
Bull. Mar. Sci.
Phylogenetic analysis of cytochrome c oxidase I sequences to determine higher-level relationships within the coleoid cephalopods
Bull. Mar. Sci.
Selecting question-specific genes to reduce incongruence in phylogenomics: a case study of jawed vertebrate backbone phylogeny
Syst. Biol.
Taxonomy, ecology and behaviour of the cirrate octopods
Can we identify genes with increased phylogenetic reliability?
Syst. Biol.
Another look at the root of the angiosperms reveals a familiar tale
Syst. Biol.
HaMStR: profile hidden markov model based search for orthologs in ESTs
BMC Evol. Biol.
Accelerated profile HMM searches
PLoS Comput. Biol.
Cases in which parsimony and compatibility methods will be positively misleading
Syst. Zool.
Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions
J. Mol. Evol.
Spider phylogenomics: untangling the Spider Tree of Life
PeerJ
The root of flowering plants and total evidence
Syst. Biol.
Full-length transcriptome assembly from RNA-Seq data without a reference genome
Nat. Biotechnol.
Comparative analysis of chromosome counts infers three paleopolyploidies in the Mollusca
Genome Biol. Evol.
A framework for the quantitative study of evolutionary trees
Syst. Zool.
Is sparse taxon sampling a problem for phylogenetic inference?
Syst. Biol.
Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype
Nature
MAFFT version 5: improvement in accuracy of multiple sequence alignment
Nucleic Acids Res.
Phylogenomics reveals deep molluscan relationships
Nature
PhyloTreePruner: a phylogenetic tree-based approach for selection of orthologous sequences for phylogenomics
Evol. Bioinform. Online
Cephalopod origin and evolution: a congruent picture emerging from fossils, development and molecules: Extant cephalopods are younger than previously realised and were under major selection to become agile, shell-less predators
BioEssays
PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating
Bioinformatics
Cited by (26)
SNP data reveals the complex and diverse evolutionary history of the blue-ringed octopus genus (Octopodidae: Hapalochlaena) in the Asia-Pacific
2023, Molecular Phylogenetics and EvolutionPhylogenomics and diversification drivers of the Eastern Asian – Eastern North American disjunct Podophylloideae
2022, Molecular Phylogenetics and EvolutionComparative genome sequence and phylogenetic analysis of chloroplast for evolutionary relationship among Pinus species
2022, Saudi Journal of Biological SciencesMesozoic origin of coleoid cephalopods and their abrupt shifts of diversification patterns
2022, Molecular Phylogenetics and EvolutionCitation Excerpt :Coleoids represent the majority of the diversity of modern cephalopods and constituted an important part of the Paleozoic nekton (Kröger 2005). Coleoids are considered monophyletic based on morphological and molecular data (Carlini and Graves, 1999; Carlini et al., 2000; Vecchione et al., 2000; Lindgren et al., 2004; Strugnell et al., 2005; Strugnell and Nishiguchi, 2007; Allcock et al., 2011; 2015; Lindgren et al., 2012; Strugnell et al., 2017; Lindgren and Anderson, 2018) and subdivided into two main clades: Decapodiformes (squids, bobtail squids and cuttlefishes) and Octopodiformes (vampire squid, cirrate and incirrate octopods). Nonetheless, some of the relationships among major taxa within each superorder are complex (specially in Decapodiformes) and continue to be the subject of ongoing debate (Nishiguchi and Mapes, 2008; Lindgren et al., 2012; Sanchez et al., 2018, Anderson and Lindgren, 2021).
Phylogenomic analyses recover a clade of large-bodied decapodiform cephalopods
2021, Molecular Phylogenetics and EvolutionCitation Excerpt :Tanner et al. (2017) used 129 loci that had been proposed as particularly suitable for metazoan phylogenetics (Philippe et al., 2011). Lindgren and Anderson (2018) used an orthology assessment approach that recovered far more loci, but some transcriptomes (most notably from Nautilus, Vampyroteuthis and Idiosepius) yielded far fewer orthologs than others (Lindgren and Anderson, 2018). The problem of limited/missing data caused by small transcriptomes is rapidly fading; the smallest transcriptome used here (other than Galiteuthis) yielded nearly 1400 orthologs using the cephalopod ortholog set in HaMStR (Table 1).