Genome-wide SNPs resolve phylogenetic relationships in the North American spruce budworm (Choristoneura fumiferana) species complex
Graphical abstract
Introduction
Incongruence between gene and species trees has been a long-recognized concept in molecular systematics and phylogenetics (e.g. Fitch, 1970, Avise et al., 1983, Maddison, 1997). Conceptually, this incongruence can be accommodated by considering the modal gene phylogeny (the most frequently occurring gene phylogeny) as the species phylogeny (Pamilo and Nei, 1988, Maddison, 1997, Sperling, 2003). Ten to twenty years ago, the main limitation of this conceptual solution was methodological: the prohibitive cost of gathering sufficient data to estimate a modal gene phylogeny, especially for many taxa. This particular limitation has been substantially decreased by the immense growth of high-throughput next-generation sequencing (NGS) (Metzker, 2009). However, systematists now face another suite of methodological challenges regarding how to appropriately use these genomic data to construct phylogenies. Genotyping-by-sequencing (GBS: Elshire et al., 2011, Poland et al., 2012) has garnered particular interest for phylogenetic reconstruction (Miller et al., 2007, Baird et al., 2008, Davey et al., 2011), particularly for relatively shallow phylogenetic context (see Rubin et al., 2012) (note: although we focus on GBS here, the same methodological considerations apply to other restriction-site-associated DNA sequencing (RAD-seq) techniques; for simplicity in referring to methodological considerations we use “GBS” exclusively). Flexible requirements for a priori genomic resources and cost-effectiveness at the scales of loci and individuals (Peterson et al., 2012) make GBS ideal for systematics in the current era, where the distinction between traditional phylogenetics and population genetics is fading (Edwards, 2009). Although a growing number of studies have illustrated the utility of GBS for reconstructing phylogenies (e.g. Rubin et al., 2012, Wagner et al., 2012, Eaton and Ree, 2013, Jones et al., 2013, Nadeau et al., 2013, Cruaud et al., 2014, Hipp et al., 2014, Gohli et al., 2015, DaCosta and Sorenson, 2016, Díaz-Arce et al., 2016, Stervander et al., 2016, Rivers et al., 2016), critical and comprehensive evaluations of the phylogenetic methods used for these datasets are still relatively rare (e.g. DaCosta and Sorenson, 2016).
Most empirical studies using GBS for phylogenomics have relied on concatenation (or supermatrix) methods, under the assumption that the volume of phylogenetic signal in thousands of loci outweighs any potential gene tree/species tree discordance (e.g. Nadeau et al., 2013, Wagner et al., 2012, Cruaud et al., 2014). Concatenation methods simplify model selection for genomic data and have been shown to be robust for estimating species trees from GBS data (Rivers et al., 2016). However, many studies have also shown that such treatment of phylogenomic data can be misleading for both determination of species relationships and evaluation of tree support (e.g. strong bootstrap support for incorrect relationships) (Kubatko and Degnan, 2007, Weisrock et al., 2012, McVay and Carstens, 2013, Wielstra et al., 2014, Giarla and Esselstyn, 2015). Alternatively, coalescent methods (such as BEAST: Drummond and Rambaut, 2007), which may be more appropriate when genealogical discordance exists, present computational challenges for GBS data, and may become unfeasible with large datasets (e.g. Zimmerman et al., 2014). GBS loci are also relatively short (generally <100 bp) and have few variable sites per locus, which can lead to unresolved relationships when analyzed individually to construct “gene” trees (following “traditional” methods to determine species trees: Knowles and Kubatko, 2010). Likewise, emerging methods that use single nucleotide polymorphisms (SNPs) rather than sequence-based data (e.g. SNAPP: Bryant et al., 2012) can also be computationally demanding with large datasets. Given the methodological uncertainty surrounding GBS data and phylogenomics, and apparent impracticality of some approaches, critical evaluation of the leading methods is vital to establishing best practices for its application in phylogenetic contexts.
Here, we assessed phylogenetic relationships among lineages of the spruce budworm (Choristoneura fumiferana (Clemens, 1865)) species complex in North America using GBS data. Species in this group (most notably C. fumiferana, considered North America’s most destructive insect defoliator of living conifers: Volney and Fleming, 2007) exhibit wide population oscillations, with outbreak densities causing high tree mortality (Gray and MacKinnon, 2006) and serious economic losses. The complex is composed of eight or nine species (Lumley and Sperling, 2011a, Gilligan et al., 2014, Brunet et al., 2016) that differ primarily on the basis of larval host preferences (Stehr, 1967, Harvey, 1985), female sex-pheromone chemistry (Sanders et al., 1977), diapause characteristics (Harvey, 1967) and geography (Stehr, 1967). However, phylogenetic relationships in the complex remain uncertain, especially the placement of the eastern pine feeding species, Choristoneura pinus Freeman 1953 (Fig. 1). While most marker systems place C. fumiferana as the sister taxon to the rest of the complex, a recent analysis employing a genome-wide set of SNPs has cast doubt on this hypothesis by placing C. pinus basal to the group (Bird, 2013). However, these findings were confounded by the omission of several vital taxa within the group, including other pine feeders (e.g. C. lambertiana (Busck 1915)), and limited by the sole use of concatenation methods for phylogeny reconstruction.
In this study, we include all consistently recognized species in the complex and assess phylogenetic relationships of the group using multiple methods and datasets. First, we implement traditional phylogenetic methods with a concatenated (supermatrix) dataset. Given the common use of GBS datasets for population genomics, we compare the results of concatenated phylogenetic analyses to individual-based Bayesian clustering and ordinations using SNPs. Finally, we implement several approaches to assess gene tree/species tree incongruence, with and without a coalescent framework using both SNPs and sequence data. We find strong support for a single set of species-level relationships, and use this phylogeny to interpret mechanisms for speciation in the complex. This study represents the most taxonomically comprehensive phylogenomic evaluation of the spruce budworm species complex to date.
Section snippets
Sample collection
A total of 127 specimens were selected to maximize sampling of geographic range, host associations, and taxonomic diversity across the spruce budworm species complex. Larvae were collected by hand and reared to adult stage on host clippings from the plants they were collected on. Adults were collected using ultraviolet and mercury vapor light traps and sheets, as well as pheromone traps baited with species-specific pheromone lures (for pheromone compositions, see Silk and Eveleigh, 2012). We
Genotyping-by-sequencing
Unambiguous barcodes were found in a total of 285 million reads (average of 2.2 million reads per individual) and 233 million reads were retained after initial quality control (average per individual: 1.8 million). Eighty-four million reads were mapped to the C. fumiferana reference genome (average per individual: 0.6 million), and 72,570 SNPs were obtained from the populations portion of Stacks. When outgroup specimens were omitted (for population genetic analyses), this number was reduced to
Methodological considerations for GBS based phylogenies
The use of GBS datasets in phylogenetics is increasingly common, yet critical evaluations of alternative phylogenetic methods have accumulated more slowly (e.g. DaCosta and Sorenson, 2016, Gohli et al., 2015, Stervander et al., 2015, Razkin et al., 2016, Rivers et al., 2016, Stervander et al., 2016). Despite healthy discussion concerning the merits of various phylogenetic methods (most often concatenation vs. coalescent methods: Weisrock et al., 2012, Wielstra et al., 2014, Giarla and
Conclusion
We present a comprehensive comparison of several phylogenetic methods using GBS data and highlight the importance of testing multiple methods and parameters for creating phylogenies from these genomic datasets. In the process, we have also resolved basal relationships in the spruce budworm species complex in North America. With few exceptions, all methods agreed on the same topology, placing C. pinus as basal to the remainder of the group. Choristoneura fumiferana, C. pinus, and C. retiniana
Data accessibility
Datafiles and alignments available from the Dryad data repository http://dx.doi.org/10.5061/dryad.00715.
Acknowledgements
We thank Alberta Environment and Sustainable Resource Development, G. Anweiler, S. Brunet, Canadian Forest Service, J. De Benedictis, J. Dombroskie, J. Doucette, A. Hundsdoerfer, J.F. Landry, L. Nolan, B. Proshek, A. Roe, D. Rubinoff, Saskatchewan Environment, and C. Whitehouse for assistance in specimen collection and two anonymous reviewers for comments. Funding was provided by an Alberta Innovates Bio Solutions grant (# VCS-11-034) and National Science and Engineering Research Council
References (111)
- et al.
Is mitochondrial DNA a strictly neutral marker?
Trends. Ecol. Evol.
(1995) - et al.
DdRAD-seq phylogenetics based on nucleotide, indel, and presence-absence polymorphisms: analyses of two avian genera with contrasting histories
Mol. Phyl. Evol.
(2016) - et al.
RAD-seq derived genome-wide nuclear markers resolve the phylogeny of tunas
Mol. Phyl. Evol.
(2016) - et al.
Phylogenetic analyses at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum
Mol. Phyl. Evol.
(2014) - et al.
Utility of microsatellites and mitochondrial DNA for species delimitation in the spruce budworm (Choristoneura fumiferana) species complex (Lepidoptera: Tortricidae)
Mol. Phyl. Evol.
(2011) - et al.
Species limits, interspecific hybridization and phylogeny in the cryptic land snail complex Pyramidula: the power of RADseq data
Mol. Phyl. Evol.
(2016) - et al.
Coalescence vs. concatenation: sophisticated analyses vs. first principles applied to rooting the angiosperms
Mol. Phyl. Evol.
(2015) - et al.
Mitochondrial DNA differentiation during the speciation process in Peromyscus
Mol. Biol. Evol.
(1983) - et al.
Rapid SNP discovery and genetic mapping using sequenced RAD markers
PLoS ONE
(2008) - et al.
Evolutionary analyses of non-genealogical bonds produced by introgressive descent
Proc. Natl. Acad. Sci. USA
(2012)
Phylogenomics of the Choristoneura fumiferana species complex (Lepidoptera: Tortricidae)
Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis
Mol. Biol. Evol.
BEAST 2: a software platform for Bayesian evolutionary analysis
PLOS Comput. Biol.
Interspecific and intraspecific genetic comparisons of North American spruce budworms (Choristoneura spp.)
Is RAD-seq suitable for phylogenetic inference? An in silico assessment and optimization
Ecol. Evol.
Stacks: building and genotyping loci de novo from short-read sequences
G3: Genes, Genomes, Genet.
Stacks: an analysis tool set for population genomics
Mol. Ecol.
Leaky prezygotic isolation and porous genomes: rapid introgression of maternally inherited DNA
Evolution
Empirical assessment of RAD sequencing for interspecific phylogeny
Mol. Biol. Evol.
The variant call format and VCFtools
Bioinformatics
Morphological study of male genitalia with phylogenetic inference of Choristoneura Lederer (Lepidoptera: Tortricidae)
Can. Entomol.
Genome-wide genetic marker discovery and genotyping using next-generation sequencing
Nat. Rev. Genet.
Phylogeny of the tribe Archipini (Lepidoptera: Tortricidae: Tortricinae) and evolutionary correlates of novel secondary structures
Zootaxa
BEAST: Bayesian evolutionary analysis by sampling trees
BMC Evol. Biol.
Multi-locus species delimitation in closely related animals and fungi: one marker is not enough
Mol. Ecol.
PyRAD: assembly of de novo RADseq loci for phylogenetic analyses
Bioinformatics
Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae)
Syst. Biol.
Is a new and general theory of molecular systematics emerging?
Evolution
A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species
PLoS ONE
Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study
Mol. Ecol.
Inference of population structure: extensions to linked loci and correlated allele frequencies
Genetics
Confidence limits on phylogenies: an approach using the bootstrap
Evolution
Distinguishing homologous from analogous proteins
Syst. Zool.
Toward defining the course of evolution: minimum change for a specific tree topology
Syst. Zool.
On coniferophagous species of Choristoneura (Lepidoptera: Tortricidae) in North America, I. Some new forms of Choristoneura allied to C. fumiferana
Can. Entomol.
Species-level paraphyly and polyphyly: frequency, causes, and consequences, with insight from animal mitochondrial DNA
Annu. Rev. Ecol. Evol. Syst.
The challenges of resolving a rapid, recent radiation: empirical and simulated phylogenomics of Philippine shrews
Syst. Biol.
The evolutionary history of Afrocanarian blue tits inferred from genomewide SNPs
Mol. Evol.
New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0
Syst. Biol.
Outbreak patterns of the spruce budworm and their impacts in Canada
For. Chron.
On caniferophagous Choristoneura (Lepidoptera: Tortricidae) in North America. V. Second diapause as a species character
Can. Entomol.
The taxonomy of the coniferophagous Choristoneura (Lepidoptera: Tortricidae): a review
Genetic relationships among Choristoneura species (Lepidoptera: Tortricidae) in North American as revealed by isozyme studies
Can. Entomol.
Interspecific crosses and fertile hybrids among the coniferophagous Choristoneura (Lepidoptera: Tortricidae)
Can. Entomol.
Dating of the human-ape splitting by a molecular clock of mitochondrial DNA
J. Mol. Evol.
A framework phylogeny of the American oak clade based on sequenced RAD data
PLoS ONE
Cited by (31)
Gauging ages of tiger swallowtail butterflies using alternate SNP analyses
2022, Molecular Phylogenetics and EvolutionCitation Excerpt :This data consists of short sequences (<100 bp) containing one or few single nucleotide polymorphisms (SNPs) and requires careful consideration in phylogenetic analyses. Many studies have evaluated strategies for using these data for tree estimation (Campbell et al., 2020; DaCosta and Sorenson, 2016; Dupuis et al., 2017; Leaché and Oaks, 2017; L. Loureiro et al., 2020; L. O. Loureiro et al., 2020; McCormack et al., 2013; Rivers et al., 2016; Schmidt-Lebuhn et al., 2017), species delimitation (Beheregaray et al., 2017; Georges et al., 2018; Leaché et al., 2014; Ortiz et al., 2021; Pante et al., 2015; Shaffer and Thomson, 2007; Villamil et al., 2019), and other tasks in evolutionary biology such as inference of introgression (Paetzold et al., 2019). However, best practices remain elusive for estimating divergence times with SNPs.
Phylogenomic resolution of the Ceratitis FARQ complex (Diptera: Tephritidae)
2021, Molecular Phylogenetics and EvolutionThe potential of genome-wide RAD sequences for resolving rapid radiations: a case study in Cactaceae
2020, Molecular Phylogenetics and EvolutionCitation Excerpt :However, in our analyses, ASTRAL led to higher resolution in short internal nodes than quartets (Fig. 3). Despite that, the analysis of many small loci may remain problematic when co-estimation methods (e.g., STARBEAST; Dupuis et al., 2017) are used, which is mainly due to the high computational demand. Indeed, our preliminary analyses using summary species tree methods led to inconsistent results (data not shown).
Species delimitation and evolutionary reconstruction within an integrative taxonomic framework: A case study on Rhinolophus macrotis complex (Chiroptera: Rhinolophidae)
2019, Molecular Phylogenetics and EvolutionConvergent herbivory on conifers by Choristoneura moths after boreal forest formation
2018, Molecular Phylogenetics and EvolutionCitation Excerpt :No mitogenome has been published yet for the well-studied species of Archipini, several of which are in the Choristoneura fumiferana (Clemens, 1865) species complex, also known as the spruce budworm (SBW) species complex. The SBW complex includes eight (Brunet et al., 2017; Dupuis et al., 2017) or nine (Brown, 2005; Gilligan et al., 2014a) coniferophagous species of Choristoneura in North America. In addition to major impact on forestry by the SBW complex (Alfaro and Fuentealba, 2016), Choristoneura murinana (Hübner, 1799) is an important conifer pest in Europe (Sarýkaya and Avcý, 2005), while Choristoneura rosaceana (Harris, 1841) and Choristoneura conflictana (Walker, 1863) are pests in orchards and aspen forest in North America (Holsten and Hard, 1985; Reissig, 1978).