Assessing the utility of transcriptome data for inferring phylogenetic relationships among coleoid cephalopods

https://doi.org/10.1016/j.ympev.2017.10.004Get rights and content

Highlights

  • Topological stability is impacted by taxon sampling, gene number and missing data.

  • We recover Sepiida and Myopsida as sister groups, with this pair sister to Oegopsida.

  • Sepioidea (Idiosepiida, Sepiida, Sepiolida and Spirulida) is not monophyletic.

  • Our trees show higher levels of branch support than recovered by multigene datasets.

  • Our topology differs from trees generated in other recent phylogenomic studies.

Abstract

Historically, deep-level relationships within the molluscan class Cephalopoda (squids, cuttlefishes, octopods and their relatives) have remained elusive due in part to the considerable morphological diversity of extant taxa, a limited fossil record for species that lack a calcareous shell and difficulties in sampling open ocean taxa. Many conflicts identified by morphologists in the early 1900s remain unresolved today in spite of advances in morphological, molecular and analytical methods. In this study we assess the utility of transcriptome data for resolving cephalopod phylogeny, with special focus on the orders of Decapodiformes (open-eye squids, bobtail squids, cuttlefishes and relatives). To do so, we took new and previously published transcriptome data and used a unique cephalopod core ortholog set to generate a dataset that was subjected to an array of filtering and analytical methods to assess the impacts of: taxon sampling, ortholog number, compositional and rate heterogeneity and incongruence across loci. Analyses indicated that datasets that maximized taxonomic coverage but included fewer orthologs were less stable than datasets that sacrificed taxon sampling to increase the number of orthologs. Clades recovered irrespective of dataset, filtering or analytical method included Octopodiformes (Vampyroteuthis infernalis + octopods), Decapodiformes (squids, cuttlefishes and their relatives), and orders Oegopsida (open-eyed squids) and Myopsida (e.g., loliginid squids). Ordinal-level relationships within Decapodiformes were the most susceptible to dataset perturbation, further emphasizing the challenges associated with uncovering relationships at deep nodes in the cephalopod tree of life.

Introduction

The molluscan class Cephalopoda contains some of the most charismatic invertebrates on Earth, and yet, many questions about their evolutionary history remain. The approximately 900 species of extant nautiloids, octopods, bobtail squids, cuttlefishes and squids comprise a group defined by a high degree of morphological diversity, rapid radiation and a poor fossil record for many taxa, all of which make inferring their phylogenetic history challenging. Two major factors may have influenced radiation and diversification of extant cephalopod taxa: the extinction of the ammonites and belemnoids at ∼66 mya, which may have opened up new niches, and the radiation of bony fishes, which are direct competitors with, prey of and predators on cephalopods (Aronson, 1991). Several cephalopod clades likely have undergone major Cenozoic radiations, including Oegopsida (∼250 sp., most oceanic ‘open-eye’ squids), Octopodidae (∼150 sp., benthic octopods) and Sepiida (∼100 sp., cuttlefishes), while other lineages such as Vampyroteuthis infernalis appear to have remained relatively unchanged. The timing and tempo of these radiations are difficult to assess due to weak fossil data for many lineages and an uncertain phylogeny for deep nodes (see Allcock et al., 2014 for details), even though significant methodological advances have been made (Rabosky et al., 2013, Stadler, 2011). Internal factors that can affect diversification rates such as genome duplication events have been identified in vertebrates (Jaillon et al., 2004) where duplicated genes likely led to new functions, such as osmoregulation in salmonids (Norman et al., 2012). Although less well studied, evidence for one or more genome duplication events in cephalopods exists (Hallinan and Lindberg, 2011). Lastly, extant cephalopods have undergone several habitat transitions that likely influenced diversification rate and character evolution (e.g., Kröger et al., 2011, Strugnell et al., 2006).

Over the last century, researchers have utilized a variety of approaches to study phylogenetic relationships within Cephalopoda, with limited success. Despite extensive work using morphological data, traditional multi-gene Sanger sequencing techniques or whole mitochondrial genomes, a good understanding of cephalopod ordinal relationships remains elusive (Allcock et al., 2011, Lindgren, 2010, Young and Vecchione, 1996). The largest molecular phylogenetic study in terms of taxon sampling incorporated publicly available data from six nuclear and four mitochondrial loci for over 400 OTUs to test hypotheses of convergent evolution and for correlation between morphology and habitat, providing new insight and support for some of the major subclades (Lindgren et al., 2012). At present (see Allcock et al., 2014 for a summary), clades that have been largely robust to differences in taxon sampling, data and/or phylogenetic method include Octopodiformes (all octopods and Vampyroteuthis infernalis), Incirrata (all octopods lacking fins), Cirrata (finned octopods) and Decapodiformes (open-eye squids, bobtail squids, pygmy squids, loliginids, cuttlefishes and Spirula spirula, the ram’s horn squid).

The problem of poor support and/or inconsistent resolution is best exemplified in Decapodiformes, the major clade containing the orders Oegopsida (most oceanic “open eye” squids), Bathyteuthoidea (comb-finned squids and their relatives), Idiosepiida (pygmy squids), Sepiida (cuttlefishes), Sepiolida (bobtail squids), Spirulida (Spirula spirula) and Myopsida (comprising Loliginidae—a family of mostly large-bodied, muscular, neritic squid, many of major fisheries importance—and Australiteuthidae, a poorly known group of small squid; Lu, 2005). Little progress on resolving relationships among these lineages has been made since the morphological research of Naef (1923). He proposed that extant Decapodiformes should be subdivided into two groups: Sepioidea (containing Idiosepiida, Sepiida, Sepiolida and Spirulida) and Teuthoidea (Myopsida and Oegopsida). However, Naef struggled with the position of Myopsida, due to shared characteristics with both Sepioidea and Oegopsida. Berthold and Engeser (1987) partially supported Naef’s hypothesis, but suggested that Spirulida was a sister taxon to sepioids+loliginids, a group they termed “Uniductia.” More recently, no molecular study to date has found support for Sepioidea or Teuthoidea sensu Naef, and the position of Myopsida varies significantly depending on analytical method, data and taxon sampling (Allcock et al., 2014). In general, decapodiform relationships vary with differences in taxon sampling, type of genetic data used and analytical method employed (e.g., Carlini and Graves, 1999, Lindgren, 2010, Lindgren et al., 2012, Strugnell et al., 2005, Strugnell and Nishiguchi, 2007) and the issue of how the sepioids and loliginids are related to each other and to Oegopsida remains contentious (Allcock et al., 2014).

Next-generation sequencing (NGS) techniques have shown a high degree of success in phylogeny estimation for a variety of taxonomic groups, including mollusks (Kocot et al., 2011, Smith et al., 2011). Some genome/transcriptome-scale studies have included representatives of multiple cephalopod lineages (Kocot et al., 2011, Smith et al., 2011), but these studies were focused on molluscan phylogeny and lacked representatives of major cephalopod lineages (e.g., Oegopsida, Sepiida and Vampyromorpha). Similarly, Albertin et al. (2015) used genome and transcriptome data to infer the phylogenetic position of Octopus bimaculoides within Mollusca, but key questions in coleoid cephalopod phylogeny could not be addressed because Nautiloidea, Oegopsida and Vampyromorpha were not sampled. Recently, a study by Strugnell et al. (2017) utilized mitochondrial genome data from two new taxa, Spirula spirula and Sepiadarium austrinum, to evaluate higher-level relationships, finding support for a close association between Spirulida, Bathyteuthoidea and Oegopsida (a finding also supported by a multigene phylogeny; Lindgren et al., 2012) and provided new hypotheses regarding the placement of Idiosepiida, Sepiida, Sepiolida and Myopsida (Strugnell et al., 2017). Another recent phylogenetic study that included all 39 previously published cephalopod mitochondrial genomes, plus data for four to five mitochondrial protein-coding genes from Spirula and four octopods (including the cirrate Opisthoteuthis massyae) (Uribe and Zardoya, 2017) was published shortly after Strugnell et al. (2017). Finally, while the present paper was under review, a study by Tanner et al. (2017) was published in which the authors tested hypotheses of cephalopod relationships using a combination of NGS data, including a transcriptome for the cirrate octopod Grimpoteuthis glacialis (now Cirroctopus glacialis—Collins and Villanueva, 2006, O’Shea, 1999—but we retain the usage of Tanner et al. for clarity) and a small amount of shotgun genome sequence data for Spirula spirula.

The present study aims to incorporate new and published transcriptome data to test the sensitivity and utility of large-scale datasets for inferring relationships among cephalopod lineages, particularly within Decapodiformes. Here, we evaluate the utility of NGS data for cephalopod phylogeny by generating datasets using a new cephalopod core ortholog assignment pipeline. Additionally, we employed several filtering steps and analytical approaches to assess the sensitivity of transcriptome data to impacts such as missing data and compositional and rate heterogeneity artifacts.

Section snippets

Taxon sampling

For our initial analyses, all publicly available cephalopod transcriptome data as of 9 February 2016 (47 total) were downloaded from the Sequence Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra/) and the GenBank EST database (https://www.ncbi.nlm.nih.gov/nucest) as fastq or fasta files. Predicted proteins from the Octopus bimaculoides genome (Albertin et al., 2015) were also downloaded from https://www.ncbi.nlm.nih.gov/genome/41501.

Novel transcriptome data were also generated for

Transcriptomes, datasets and matrices

Novel transcriptomes from four cephalopods (Table 1) were used in this study; these data are available from the NCBI Sequence Read Archive under BioProject accession number SRP119608. The initial 30 transcriptomes incorporated into this study were highly variable in terms of number of contigs assembled by Trinity and the number of orthologs recovered by HaMStR (Table 1). Several transcriptomes contained contigs representing ∼2000 orthologs in the cephalopod core ortholog set (e.g., Doryteuthis

The utility of phylotranscriptomics for cephalopod phylogenetics

Regardless of the dataset construction method, the degree of missing data or filtering method, several uncontroversial relationships were consistently recovered in this study: Decapodiformes, Octopodiformes and all families for which multiple members were included all formed well-supported clades. Surprisingly, when we took several of the transcriptomes used in our best1 dataset and incorporated data for several additional species used by Tanner et al. and not previously available to us (

Conclusion

While genomic data are known to be useful for inferring deep-level relationships in many cases, such as for Mollusca (e.g., Kocot et al., 2011, Smith et al., 2011), a strong phylogenetic signal must be present. Extant cephalopods (particularly Decapodiformes) seem to be the product of ancient rapid radiations, which may confound our ability to resolve deep-node relationships, even in the presence of a large number of loci. In these cases, taxon sampling (e.g., Pyron, 2015), orthology inference

Availability of data and material

The datasets generated during and analyzed in this study, custom scripts used in the phylogenomics pipeline, the cephalopod core ortholog set for use in HaMStR and phylogenetic trees are available in the Dryad repository (Supplementary Material).

Competing interests

We have no competing interests.

Authors’ contributions

ARL and FEA designed the study; ARL collected the new transcriptome data; FEA implemented the bioinformatics pipelines; ARL and FEA performed the phylogenetic analyses; ARL and FEA wrote and edited the manuscript.

Acknowledgement

We are grateful to Dick Young, Michael Vecchione and two anonymous reviewers for providing feedback on this manuscript. Thanks also to Alistair Tanner and Rute de Fonseca who provided us with access to their assemblies as well as guidance on how their dataset was constructed. All new transcriptome sequence data were obtained in the laboratory of Todd Oakley at UCSB. Thanks also to Sabrina Pankey who collected the Vampyroteuthis infernalis and Todarodes pacificus specimens used for this study.

References (82)

  • J. Strugnell et al.

    Molecular phylogeny of coleoid cephalopods (Mollusca: Cephalopoda) using a multigene approach; the effect of data partitioning on resolving phylogenies in a Bayesian framework

    Mol. Phylogenet. Evol.

    (2005)
  • J.M. Strugnell et al.

    Whole mitochondrial genome of the Ram’s Horn Squid shines light on the phylogenetic position of the monotypic order Spirulida (Haeckel, 1896)

    Mol. Phylogenet. Evol.

    (2017)
  • J.B. Whitfield et al.

    Deciphering ancient rapid radiations

    Trends Ecol. Evol.

    (2007)
  • C.B. Albertin et al.

    The octopus genome and the evolution of cephalopod neural and morphological novelties

    Nature

    (2015)
  • F.A. Aldrich

    Lol-i-go and far away: a consideration of the establishment of the species designation Loligo pealei

  • A.L. Allcock et al.

    What can the mitochondrial genome reveal about higher-level phylogeny of the molluscan class Cephalopoda?

    Zool. J. Linn. Soc.

    (2011)
  • A.L. Allcock et al.

    The contribution of molecular data to our understanding of cephalopod evolution and systematics: a review

    J. Nat. Hist.

    (2014)
  • F.E. Anderson et al.

    Lights out: the evolution of bacterial bioluminescence in Loliginidae

    Hydrobiologia

    (2014)
  • R.B. Aronson

    Ecology, paleobiology and evolutionary constraint in the octopus

    Bull. Mar. Sci.

    (1991)
  • Berthold, T., Engeser, T., 1987. Phylogenetic analysis and systematization of the Cephalopoda (Mollusca). Verhandlungen...
  • D.B. Carlini et al.

    Phylogenetic analysis of cytochrome c oxidase I sequences to determine higher-level relationships within the coleoid cephalopods

    Bull. Mar. Sci.

    (1999)
  • M.-Y. Chen et al.

    Selecting question-specific genes to reduce incongruence in phylogenomics: a case study of jawed vertebrate backbone phylogeny

    Syst. Biol.

    (2015)
  • M.A. Collins et al.

    Taxonomy, ecology and behaviour of the cirrate octopods

  • V.P. Doyle et al.

    Can we identify genes with increased phylogenetic reliability?

    Syst. Biol.

    (2015)
  • B.T. Drew et al.

    Another look at the root of the angiosperms reveals a familiar tale

    Syst. Biol.

    (2014)
  • I. Ebersberger et al.

    HaMStR: profile hidden markov model based search for orthologs in ESTs

    BMC Evol. Biol.

    (2009)
  • S.R. Eddy et al.

    Accelerated profile HMM searches

    PLoS Comput. Biol.

    (2011)
  • J. Felsenstein

    Cases in which parsimony and compatibility methods will be positively misleading

    Syst. Zool.

    (1978)
  • P.G. Foster et al.

    Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions

    J. Mol. Evol.

    (1999)
  • N.L. Garrison et al.

    Spider phylogenomics: untangling the Spider Tree of Life

    PeerJ

    (2016)
  • V.V. Goremykin et al.

    The root of flowering plants and total evidence

    Syst. Biol.

    (2015)
  • M.G. Grabherr et al.

    Full-length transcriptome assembly from RNA-Seq data without a reference genome

    Nat. Biotechnol.

    (2011)
  • N.M. Hallinan et al.

    Comparative analysis of chromosome counts infers three paleopolyploidies in the Mollusca

    Genome Biol. Evol.

    (2011)
  • M.D. Hendy et al.

    A framework for the quantitative study of evolutionary trees

    Syst. Zool.

    (1989)
  • D.M. Hillis et al.

    Is sparse taxon sampling a problem for phylogenetic inference?

    Syst. Biol.

    (2003)
  • O. Jaillon et al.

    Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype

    Nature

    (2004)
  • K. Katoh et al.

    MAFFT version 5: improvement in accuracy of multiple sequence alignment

    Nucleic Acids Res.

    (2005)
  • K.M. Kocot et al.

    Phylogenomics reveals deep molluscan relationships

    Nature

    (2011)
  • K.M. Kocot et al.

    PhyloTreePruner: a phylogenetic tree-based approach for selection of orthologous sequences for phylogenomics

    Evol. Bioinform. Online

    (2013)
  • B. Kröger et al.

    Cephalopod origin and evolution: a congruent picture emerging from fossils, development and molecules: Extant cephalopods are younger than previously realised and were under major selection to become agile, shell-less predators

    BioEssays

    (2011)
  • N. Lartillot et al.

    PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating

    Bioinformatics

    (2009)
  • Cited by (26)

    • Mesozoic origin of coleoid cephalopods and their abrupt shifts of diversification patterns

      2022, Molecular Phylogenetics and Evolution
      Citation Excerpt :

      Coleoids represent the majority of the diversity of modern cephalopods and constituted an important part of the Paleozoic nekton (Kröger 2005). Coleoids are considered monophyletic based on morphological and molecular data (Carlini and Graves, 1999; Carlini et al., 2000; Vecchione et al., 2000; Lindgren et al., 2004; Strugnell et al., 2005; Strugnell and Nishiguchi, 2007; Allcock et al., 2011; 2015; Lindgren et al., 2012; Strugnell et al., 2017; Lindgren and Anderson, 2018) and subdivided into two main clades: Decapodiformes (squids, bobtail squids and cuttlefishes) and Octopodiformes (vampire squid, cirrate and incirrate octopods). Nonetheless, some of the relationships among major taxa within each superorder are complex (specially in Decapodiformes) and continue to be the subject of ongoing debate (Nishiguchi and Mapes, 2008; Lindgren et al., 2012; Sanchez et al., 2018, Anderson and Lindgren, 2021).

    • Phylogenomic analyses recover a clade of large-bodied decapodiform cephalopods

      2021, Molecular Phylogenetics and Evolution
      Citation Excerpt :

      Tanner et al. (2017) used 129 loci that had been proposed as particularly suitable for metazoan phylogenetics (Philippe et al., 2011). Lindgren and Anderson (2018) used an orthology assessment approach that recovered far more loci, but some transcriptomes (most notably from Nautilus, Vampyroteuthis and Idiosepius) yielded far fewer orthologs than others (Lindgren and Anderson, 2018). The problem of limited/missing data caused by small transcriptomes is rapidly fading; the smallest transcriptome used here (other than Galiteuthis) yielded nearly 1400 orthologs using the cephalopod ortholog set in HaMStR (Table 1).

    View all citing articles on Scopus
    View full text