Multi-gene analysis of the symbiotic and free-living dinoflagellate genus Symbiodinium

Symbiodinium, a large group of dinoflagellates, live in symbiosis with marine protists, invertebrate metazoans, and free-living in the environment. Symbiodinium are functionally variable and play critical energetic roles in symbiosis. Our knowledge of Symbiodinium has been historically constrained by the limited number of molecular markers available to study evolution in the genus. Here we compare six functional genes, representing three cellular compartments, in the nine known Symbiodinium lineages. Despite striking similarities among the single gene phylogenies from distinct organelles, none were evolutionarily identical. A fully concatenated reconstruction, however, yielded a well-resolved topology identical to the current benchmark nr28S gene. Evolutionary rates differed among cellular compartments and clades, a pattern largely driven by higher rates of evolution in the chloroplast genes of Symbiodinium clades D2 and I. The rapid rates of evolution observed amongst these relatively uncommon Symbiodinium lineages in the functionally critical chloroplast may translate into potential innovation for the symbiosis. The multi-gene analysis highlights the potential power of assessing genome-wide evolutionary patterns using recent advances in sequencing technology and emphasizes the importance of integrating ecological data with more comprehensive sampling of free-living and symbiotic Symbiodinium in assessing the evolutionary adaptation of this enigmatic dinoflagellate.

Dinoflagellates in the genus Symbiodinium are essential components of coral reef ecosystems in their role as photosynthetic endosymbionts of a myriad of marine organisms belonging to at least five distinct phyla: Foraminifera, Porifera, Cnidaria, Mollusca, and Platyhelminthes (Trench, 1993). Perhaps best known for their relationship with scleractinian corals, Symbiodinium underpin the productivity and calcification that creates coral skeletons and the structures known as coral reefs that serve as habitat for the immense biodiversity these coastal ecosystems support.
Research conducted during the last two decades has allowed extensive genotyping of endosymbiotic Symbiodinium in both the Western Atlantic and Indo-Pacific Oceans and across host taxa at a variety of spatial and temporal scales (reviewed in Coffroth & Santos, 2005;Franklin et al., 2012;Stat, Carter & Hoegh-Guldberg, 2006). Several recent studies have also begun to describe Symbiodinium diversity in free-living environments, including the water column (Manning & Gates, 2008;Takabayashi et al., 2012), sediments Porto et al., 2008;Takabayashi et al., 2012), coral sand , coral rubble , on the surface of macroalgal beds Venera-Ponton et al., 2010), and in fish feces (Castro et al., 2012;Porto et al., 2008). These studies have collectively led to the molecular classification of Symbiodinium into nine lineages, clades A through I (Table 1), most commonly delineated phylogenetically using the nuclear large subunit ribosomal D1-D3 region (nr28S) and the chloroplast large subunit ribosomal DNA domain V (cp23S). Clades D, F, and G have been further divided into sub-clades D1-D2, F2-F5, and G1-G2 using the same molecules, respectively (Hill et al., 2011;Pochon, LaJeunesse & Pawlowski, 2004;Pochon et al., 2006).
Here, we present a multi-gene analysis of Symbiodinium comparing: 1) individual and concatenated phylogenies of six markers that include the nr28s, a benchmark gene for clade analyses, and 2) the rates of evolution of two selected genes from three organelles (nucleus, mitochondria and chloroplast) across all known clades and sub-clades (Table 2). Individual and concatenated phylogenies were analyzed to test the hypothesis that organelles have evolved differently among clades and that a six-gene concatenated tree increases the resolution of the current nr28S tree. We then applied pair-wise relative substitution rate analyses in each marker to characterize compartment-specific differences in evolutionary rates among Symbiodinium clade and gene organelle.

Genes Selection, DNA extraction and Sequencing
Six genes from three organelles were chosen for phylogenetic analyses. These include two  , nine DNA samples were extracted and partially analyzed in other studies  and further sequenced here to cover all genes using the primers and PCR cycling conditions described in Pochon et al. (2012), and two DNA samples were extracted from sponge tissues of the genus Cliona (courtesy of C. Schoenberg) and sequenced for all genes following  (see Table 2). The psbA gene was not reported in Pochon et al. (2012) and was PCR amplified in this study using the forward primer psbA_1.0 (5'-CWGTAGATATTGATGGWATAAGAGA-3') located at the 5' end of the coding region and the reverse primer psbA_3.0 (5'-TTGAAAGCCATTGTYCTTACTCC-3') located approximately 700 bp downstream from the 5' end and using standard thermocycling conditions with an annealing temperature of 52C. All sequences were obtained by direct sequencing, except for nr28S and cp23S sequences, which were cloned prior to sequencing in Pochon et al. (2012), and a single sequence per sample included in the present study. In all cases, the variability between cloned sequences of any given sample was minimal (e.g., see Figure S1 of Pochon et al., 2012), ranging between 0 and 4bp difference (data not shown). However, sequences showing the shortest branch length in each sample were selected (data not shown). In cases where several sequences showed the same short branch length, one sequence was randomly chosen among them and included in the analysis.

Phylogenetic analyses
DNA sequences were inspected and assembled using Sequencher v4.7 (Gene Codes Corporation, Ann Arbor, MI, USA) and manually aligned with BioEdit v5.0.9 sequence alignment software (Hall, 1999). Thirteen distinct DNA alignments were generated: six alignments corresponding to individual gene alignments, one fully concatenated alignment of all six genes (ALL Concat), and six partially concatenated alignments including all possibilities of five genes each (i.e., each alignment excluded one of the six gene candidates). Concatenated alignments were created using the 'join sequence files' option in TREEFINDER v12.2.0 (Jobb, von Haeseler & Strimmer, 2004). elf2 was included in these analyses despite two missing samples (see samples #27 and #30; Table 2), which were coded as missing data in all concatenated alignments. GenBank accession numbers for all investigated sequences are shown in Table 2.
Each DNA alignment was analyzed independently under both Maximum-likelihood (ML) and Bayesian environments. Best-fit models of evolution were estimated for each alignment (see Table S1) using Modeltest v3.7 (Posada & Crandall, 1998). ML analyses were carried out using PhyML v3.0 (Guindon et al., 2009), and the reliability of internal branches was assessed using 100 bootstraps with subtree pruning-regrafting branch swapping. Bayesian tree reconstructions with posterior probabilities were inferred using MrBayes v3.2 (Ronquist et al., 2012), using the same model of DNA evolution as for the ML analyses. Four simultaneous Markov chains were run for 1,000,000 generations with trees sampled every 10 generations, with 50,000 initial trees discarded as ''burn-in", based on visual inspections. Concatenated alignments were run under ML and Bayesian environments as described above, with the alignments partitioned so that the specific model of evolution corresponded to each gene fragment.

Topological tests, rate calculations, and statistical analyses.
To compare the topology of the various trees, approximately unbiased (AU) topological congruency tests (Shimodaira, 2002) were performed using site likelihood calculation in RaxML v7.2.5 (Stamatakis, 2006), followed by AU tests using CONSEL (Shimodaira & Hasegawa, 2001) with default scaling and replicate values. elf2 was excluded from the single gene analyses due to missing data (samples #27 and #30; Table 2), but was included in the concatenated analyses (see above).
In order to determine evolutionary rates among Symbiodinium lineages for each of the six investigated genes, relative-rate tests (RRT) were performed using the program RRTREE v1.1 (Robinson-Rechavi & Huchon, 2000). Clades and sub-clades were compared in a pair-wise fashion with G. simplex as the outgroup. Relative rates of evolution (K-scores from RRTREE analysis above) were compared among clades and among cellular organelles using a two way ANOVA, followed by post hoc analysis with Tukey's Honestly Significant Difference (THSD) test.

Results
DNA alignments for the six investigated genes ranged between 473 (elf2) and 1,057 bp (coI). Individual phylogenies were generated (Figure 1), and each was compared to the topology obtained with the nr28S gene, which is the current molecular taxonomic benchmark for the cladelevel classification of Symbiodinium (Hill et al., 2011;Pochon et al., 2012). Overall, the cladal relationships were remarkably similar among the genes investigated, particularly the basal positions of clades A, D, E and G, and the derived positions of clades B, C, F, H, and I. Symbiodinium clades were relatively well resolved in the nuclear and chloroplastic genes, but not the mitochondrial genes, which placed clades C, F, and H in completely unresolved monophyletic groups (see Figure 1E-1F). However, with the exception of nr28S, the relationships amongst clades were weakly supported for all markers, especially in the higher parts of the trees, and this was particularly evident for psbA where relationships between clades B, C, D, F, G, H, and I were completely unresolved ( Figure 1D). Furthermore, the relationships between sub-clades within clades D, F, and G showed contrasting results. Well-supported monophyly of all subclades was only observed in the nr28S gene ( Figure 1A). Notably however, clade G sub-clades (G1 and G2) formed a monophyletic group across all genes. In contrast, the monophyly of clade F and clade D sub-clades was only resolved with nr28S ( Figure 1A) and nr28S and cob ( Figure   1A, 1F), respectively. All Symbiodinium strains belonging to the same sub-clade grouped together across all genes, with two noteworthy exceptions. First, the four samples of sub-clade F5 (#14-16) separated into two groups in cob ( Figure 1F). Second, sample #24 (Table 2) of sub-clade D2 diverged significantly to the root of the tree in cp23S ( Figure 1C).
In order to increase the phylogenetic signal and assess which of the individual markers best reflects the most well resolved evolutionary history of Symbiodinium, a series of gene concatenation analyses were conducted. In total, seven distinct concatenated alignments were analyzed, including one fully concatenated alignment of all six genes (ALL Concat) consisting of a total length of 4,703 bp, and six partially concatenated alignments ranging in length from 3,646 bp (ALL except coI) and 4,230 bp (ALL except elf2), and including all possibilities of five genes each (see Methods). Phylogenetic analysis of the fully concatenated dataset (ALL Concat, Figure   S1) resulted in a highly resolved Symbiodinium tree with identical topology to nr28S gene, but with much stronger phylogenetic signal as evidenced by a significant increase in statistical support at all nodes ( Figure S1). Other concatenated alignments yielded weaker nodes support and unstable cladal relationships globally (data not shown).
Approximately unbiased (AU) topological congruency tests (Shimodaira, 2002) were used to verify whether any of the distinct phylogenies resulted in statistically identical topologies.
First, pair-wise comparisons of single gene phylogenies (Figure 1) resulted in significant p-values (p<0.05) in all cases, indicating that the different genes have not followed identical evolutionary trajectories (see Table S2A). Second, concatenated topologies tested against single gene topologies, also resulted in significant p-values in all instances (data not shown). Third, pair-wise comparisons of single gene phylogenies to the concatenated topologies, revealed that the two longest genes, coI and nr28S, resulted in 5 and 6 significant topological comparisons, respectively (see Table S2B). Despite the relatively smaller size of nr28S (920bp) compared to coI (1057bp), nr28S was the only marker yielding a statistically identical topology to the fully concatenated topology (ALL Concat). The nr28S topology, however, was not identical to the best topology of the concatenated alignment excluding the nr28S gene fragment (see ALL except nr28S in Table S2B). Similarly, pair-wise comparisons of concatenated topologies revealed that significant p-values (p<0.05) were only observed against the 'ALL except nr28S' topology (Table   S2B).
The variable branch lengths observed in the six phylograms (Figure 1) are directly proportional to the amount of character change; hence the longest branches are indicative of increased evolutionary rates of any given Symbiodinium strain. In most cases, increased rates of Symbiodinium clades/sub-clades appeared to be gene-specific rather than a character state maintained across all markers. K-scores from relative rate tests were coupled with ANOVA to compare the relative rates of evolution among the clades and organelles (Fig. 2) examining all clades across the three makers. There was no significant interaction of clade and organelle (F 16,175 =1.57, p=0.081), indicating that the pattern of changes in rates of evolution among clades were similar across organelles. However, organelles differed in their relative rates of evolution (F 2,175 =248.9, p=0.0001), driven by rapid rates in the chloroplastic and nuclear compartments in comparison to the mitochondrial compartment ( Fig. 2A), with the most rapid rates found in the chloroplastic markers due the high evolutionary rates of clade I and sub-clade D2 (see Figure 1C and 1D). Additionally, there was a significant differences between Clades (F 8,175 =3.87, p=0.0003) driven by the slow rates of clade A, and the rapid rates of Clade I (Fig. 2B)  Our knowledge of Symbiodinium evolution has historically been constrained by the limited number of phylogenetic markers that have been applied to this group. To date, less than 15 DNA loci have been used to examine Symbiodinium diversity in a phylogenetic context (LaJeunesse & Thornhill, 2011;Pochon et al., 2012;Rowan & Powers, 1992;Sampayo, Dove & LaJeunesse, 2009;Takabayashi, Santos & Cook, 2004;Takishita et al., 2003;van Oppen et al., 2001), and evolutionary relationships among all existing Symbiodinium lineages have never been inferred using more than two concatenated genes . This study is the first to perform a multi-gene analysis using six markers representing three cellular organelles and integrating biological samples from all known clades and selected sub-clades that encompass the genus Symbiodinium. In spite of the overall similarity among the trees for each nuclear, chloroplastic and mitochondrial gene (Figure 1), their topologies were statistically different (Table S2). This reflects within and among clade differences inherent to the individual markers.

Multi-gene analysis supports nr28S as a benchmark lineage marker
Most notably being the unstable positions of clades D, E, F5 and H, as well as weak support for among clade relationships observed in most markers investigated. Long-branch attraction artifacts (Felsenstein, 1985) most likely accounted for the placement of sub-clade D2 (sample #24) at the root of the tree in the chloroplast 23S topology, and for the monophyly of samples #7, 8, 13, and 14 in the cob topology. While the markers investigated here are conserved genes that have a priori limited utility for finer scale (i.e., within clade) analysis, each contains a unique set of characteristics, including variable cladal resolution and/or evolutionary rates (e.g., see samples #2 and #3 in coI or samples #7, 8, 13, 14 in cob), hence each marker has the potential to address different questions. These differences thus support our previous conclusion that no one gene fits all of the taxonomic questions being asked in the genus Symbiodinium .
Our fully concatenated analysis, incorporating all investigated genes and totaling 4,703 bp, resulted in a highly resolved phylogeny that was statistically identical to the nr28S gene, a gene used as the benchmark for assigning Symbiodinium lineages ( Figure S1; Table S2). The fact that the concatenated nuclear, chloroplastic, and mitochondrial genes display overall similar evolutionary histories, suggests that the molecular taxonomy of the currently recognized Symbiodinium clades using nr28S is robust , and that the points of clade differentiation are ancient, allowing for a concerted evolution of these conserved genes across genomes. These new results support a sequential evolution of Symbiodinium clades A/E/G1-G2/D1-D2/I/B/F2-F5/H/C, from most ancestral to most derived, respectively. It appears that there is a level of constraint in the system, with recombination likely being a rare event (Santos & Coffroth, 2003), a feature that maintains separation among lineages.

Compartment specific evolution and link to environmental preference/prevalence
Dinoflagellates are characterized by several genetic distinguishing features, including large genome size, and complex structure and gene regulation (Barbrook et al., 2010;Hackett et al., 2004;Howe, Nisbet & Barbrook, 2008). One prominent feature is the large number of genes that have relocated from the ancestral organellar genome to the nucleus, resulting in a significant reduction in plastid and mitochondrondrial genomes. For example, the few genes that remain in the plastid of peridinin-containing dinoflagellates are primarily the core subunits of the photosystem (including cp23S), and the cytochrome b6f and ATP synthase complex (about 16 genes including psbA) (Hackett et al., 2004). Similarly, the mitochondrial genome of dinoflagellates has been reduced to three protein-coding genes (coI, coIII, and cob), but also contains a large number of non-functional fragments separated by repetitive non-coding DNA (Barbrook et al., 2010;Waller & Jackson, 2009). Despite the fact that the six Symbiodinium genes investigated here are only a very small subset of the Symbiodinium genome, they are physically separated in three cellular compartments, each with distinct evolutionary constraints and potential. For example, our comparisons of evolutionary rates between markers revealed that the differences among cellular compartments was primarily driven by the dissimilarity in the rates of evolution in cp23S and psbA in Symbiodinium lineages D2 and I ( Fig.1; Fig. 2).
A possible explanation is that the increased evolutionary rates reflect rarity and adaptation to marginal habitats. It has been posited that rare taxa are important in driving evolutionary trajectories and innovations (Holt, 1997). Rarity in terms of small population size and isolation can drive high rates of adaptation and speciation (e.g. peripheral speciation; Mayr, 1963), as mutations in rare species are more likely to accumulate in the periphery of the founding population's habitat where rare species may be subjected to persistent directional selection in the absence of gene flow, as they colonize new areas (Garcia-Ramos & Kirkpatrick, 1997). Such a scenario is supported by the fact that lineages D2 and I have only been documented on few occasions (Carlos et al., 1999;Pochon et al., 2007;, despite numerous Symbiodinium surveys conducted over the last 20 years in both the Western Atlantic and Indo-Pacific Oceans targeting a diversity of host taxa, as well as free-living communities, and crossing a variety of spatial and temporal scales (reviewed in Coffroth & Santos, 2005;Stat, Carter & Hoegh-Guldberg, 2006). In addition, Symbiodinium D2 and I have only been detected in the Hawaiian Archipelago and Micronesia (Guam and Palau), some of the most isolated island groups in the world and areas known for harboring high levels of endemism in marine biodiversity (Hughes, Bellwood & Connolly, 2002;Pauley, 2003). Both lineages have been suspected to either be free-living because of the manner in which the sample was isolated (Carlos et al., 1999), or recently ingested free-living strains due to their apparent rarity in nature .
The high rates of evolution in chloroplastic genes in Symbiodinium sub-clade D2 and clade I might also reflect a relatively recent transition from free-living to symbiotic lifestyles.
These habitats are extremely different in nature and composition, with free-living environments exhibiting high levels of environmental variability and unpredictability, while symbiotic habitats are relatively more predictable being spatially constrained and influenced by the biology of the host. These environmental differences undoubtedly drive the very different morphologies of Symbiodinium found in these two habitats, with free-living Symbiodinium flagellated and motile, and symbiotic Symbiodinium encysted and immotile. In terms of evolutionary trajectories, such differences in environment must exert a profound influence. Symbiodinium strains evolving predominantly in symbiosis must have adapted particular biochemistry and chloroplastic functions in an environment that bears little or no resemblance to a free-living setting. Previous studies on the transition between symbiotic and free-living habitat show that changes in evolutionary rate occur in bacteria that have transitioned from free-living to a symbiotic lifestyle and mutualism (Lutzoni & Pagel, 1997;Moran, 1996). In addition, in some ectomycorrhizal assemblages, changes in evolutionary rate correspond to reversing from symbiotic to free-living lifestyle (Hibbett, Gilbert & Donoghue, 2000). Further, rapid and extreme environmental changes may favor the survival of rare and transitioning species, as their existing phenotypic diversity may contain traits pre-adapted to a changing environment (Holt, 1997).
Additional work is needed to further explore the implications of transitions between the symbiotic and free-living state, with a goal of gaining a more comprehensive understanding of the dynamics and mechanisms behind the different evolutionary trajectories observed in the chloroplastic compartment of the rare Symbiodinium strains highlighted here. Additionally, the increasing use of next-generation sequencing for characterizing entire Symbiodinium genomes (e.g., Barbrook et al., 2014) is an exciting avenue that provides unprecedented opportunities for the investigation of novel markers and paves the way for much more comprehensive phylogenomics studies to come.