Comparative analysis of full-length mitochondrial genomes of five Skeletonema species reveals conserved genome organization and recent speciation

Skeletonema species are prominent primary producers, some of which can also cause massive harmful algal blooms (HABs) in coastal waters under specific environmental conditions. Nevertheless, genomic information of Skeletonema species is currently limited, hindering advanced research on their role as primary producers and as HAB species. Mitochondrial genome (mtDNA) has been extensively used as “super barcode” in the phylogenetic analyses and comparative genomic analyses. However, of the 21 accepted Skeletonema species, full-length mtDNAs are currently available only for a single species, S. marinoi. In this study, we constructed full-length mtDNAs for six strains of five Skeletonema species, including S. marinoi, S. tropicum, S. grevillei, S. pseudocostatum and S. costatum (with two strains), which were isolated from coastal waters in China. The mtDNAs of all of these Skeletonema species were compact with short intergenic regions, no introns, and no repeat regions. Comparative analyses of these Skeletonema mtDNAs revealed high conservation, with a few discrete regions of high variations, some of which could be used as molecular markers for distinguishing Skeletonema species and for tracking the biogeographic distribution of these species with high resolution and specificity. We estimated divergence times among these Skeletonema species using 34 mtDNAs genes with fossil data as calibration point in PAML, which revealed that the Skeletonema species formed the independent clade diverging from Thalassiosira species approximately 48.30 Mya. The availability of mtDNAs of five Skeletonema species provided valuable reference sequences for further evolutionary studies including speciation time estimation and comparative genomic analysis among diatom species. Divergent regions could be used as molecular markers for tracking different Skeletonema species in the fields of coastal regions.


Background
Skeletonema is a species-rich genus of centric planktonic diatoms that belongs to the order Thalassiosirales. However, before the recognition/delimitation of eight Skeletonema species based on morphological features using light microscope, scanning and transmission electron microscope, and some molecular features, Skeletonema strains were usually regarded as a single species, S. costatum [1]. Since then, 21 Skeletonema species have been identified and taxonomically accepted [2]. Most Skeletonema species are cosmopolitan distributions, and are important primary producers especially in estuarine and marine waters [3]. For example, S. marinoi is particularly dominant in temperate coastal regions, including the Bohai Sea and the Yellow Sea [4,5], West Sea of Korea [6], Gullmar Fjord, Sweden [7], Baltic Sea [8], and Adriatic Sea [9]. S. grevillei has wide geographical distributions but low abundance [3], even though it has been recorded in Adriatic Sea [10], Arabian Sea [3], Mediterranean [9], Xiamen Harbour in China [3,11] and Hong Kong Bay in China [12]. S. tropicum were found at tropical locations including Mediterranean Sea [3], the East China Sea [3], the South China Sea [6], the Jiaozhou Bay of the Yellow Sea [13], and the Lagoa dos Patos in Brazil during summer and autumn [3]. S. pseudocostatum is often observed with unicellular or short chained forms, and is also associated with long threads, has been recorded in Australian, South African, Brazilian and Chinese waters as well as the Mediterranean, Baltic Seas and Sontecomapan lagoon, Mexico [3,14,15]. S. pseudocostatum has more restricted geographical ranges because the species was not found along American coasts [3]. S. potamos grows in freshwaters and brackish waters with salinities 2-34 [16], such as the River Danube, Hungary [17] and Patos Lagoon, Brazil [18].
Because of their worldwide distribution and critical importance in ecology and aquaculture, many Skeletonema species have been extensively studied [19,20]. For example, research found that Skeletonema species are important regulators of global climate by playing an important role in the biochemical cycling of carbon and oxygen and various nutrients [21]. Under certain circumstances, many Skeletonema species can induce harmful algal blooms (HABs) [1], some of which can cause severe economic losses among aquaculture, fisheries and tourism industries, as well as ecological structure throughout the world [22]. Skeletonema HABs, which are usually incorrectly described as S. costatum HABs, are frequent with negative consequences [23][24][25]. For example, the most dominant Skeletonema species in the Jiaozhou Bay, China was generally described as S. costatum [5]. Most Skeletonema strains isolated from the Jiaozhaou Bay were identified as S. marinoi by molecular markers [26]. Skeletonema HABs species could cause hypoxia or anoxia after reaching dense concentrations leading to indiscriminate mortalities in marine life [27]. Skeletonema HABs can also cause severe loss in the aquaculture. For example, the Skeletonema HABs utilize nutrients necessary for the growth of the red algae Porphyra in winter in Japanese aquaculture [20]. Skeletonema HABs have been described to occur worldwide. For example, in one of the best studied ocean region the Jiaozhou Bay, Skeletonema species was found to be the most dominant phytoplankton, and Skeletonema HABs were found to occur with the highest frequency over the past 84 years (1936-2019) [5]. Similarly, Skeletonema was one of the most frequently occurring diatom genus along Guangdong coast of the South China Sea during 1980-2016 [25].
In addition to their important role in ecology, some Skeletonema species are commonly used in aquaculture for feeding bivalves and crustaceans due to their good nutrient composition and absence of toxins [28]. As Skeletonema species are easily cultured, they are also often used as model systems for studying phytoplankton species [1]. For example,   [29] proposed S. marinoi as a genetic model for various attributes of marine diatoms, including grazer effects [30], sexual reproduction [31], programmed cell death [32], herbicide effects [33] and temperature acclimation [34]. S. marinoi has also been used in micro-evolutionary studies of temporal changes in diatom ecology and genetics, because it has a benthic resting stage that can survive in sediments for more than 100 years [35,36].
Despite their ecological importance and aquacultural values, the biodiversity of Skeletonema species have not been well characterized due primarily to their high morphological similarities, especially for closely related species pairs such as S. marinoi and S. dohrnii [37], and S. menzelii and S. tropicum [3]. The application of molecular analysis using molecular markers, including full-length 18S rDNA and partial-length 28S rDNA (D1-D3), along with comparative analysis of morphological features, enabled the identification of eight Skeletonema species including S. menzelii, S. pseudocostatum, S. subsalsum, S. tropicum, S. dohrnii, S. grethae, S. japonicum and S. marinoi [1,38,39]. Nevertheless, common molecular markers such as partial 18S rDNA (V4 region) in environmental metabarcoding analyses using the second generation sequencing technology cannot adequately distinguish Skeletonema species [26]. cox1 was shown to be the most useful marker compared with 18S rDNA and 28S rDNA D1-D3 in Skeletonema species identification [40], while the insufficient sequences of full length cox1 gene in the NCBI database limited their application.
Because of their small sizes, conserved gene components, low level of recombination and maternal inheritance, mitochondrial genomes (mtDNAs) have been extensively used in the phylogenetic analysis, population genetics and biogeographic distribution [41]. mtDNAs have been used as "super barcode" for comparative genomics analysis [42]. Furthermore, mtDNAs usually show relatively higher evolutionary rates than nuclear genomes, which make them an appropriate platform for developing molecular markers with high resolution [43,44]. By now, mtDNA of only a single Skeletonema species, S. marinoi, has been constructed [45].
In this study, we constructed mtDNAs of six Skeletonema strains (from five Skeletonema species), including S. marinoi, S. tropicum, S. grevillei, S. pseudocostatum and S. costatum (two strains), which were isolated from coastal waters in China. Comparative analysis of these mtDNAs revealed high gene and synteny conservation, as well as segments of high divergence, which could be used to develop molecular markers for distinguishing Skeletonema species. We also explored divergence times for Skeletonema species and other species in the class Mediophyceae based on their shared protein-coding genes (PCGs) of mtDNAs.

Results
Morphological and molecular identification of six Skeletonema strains All candidate Skeletonema strains, which were readily identified by its cylindrical cells, usually formed long or short chains (Fig. 1A). S. pseudocostatum was observed with unicellular or two-three cells chains. Cell sizes of the candidate Skeletonema species varied substantially, which were not surprising because diatom cell size decreases as a result of the mechanism of cell division [31]. Thus, cell sizes alone could not be used as defining features for identifying Skeletonema species. Molecular information and more elaborate morphological characteristics are needed to facilitate species identification.

Phylogenetic analysis of Skeletonema mtDNAs
Among the 36 Bacillariophyta species with reported mtDNAs, the five Skeletonema species belonged to the class Mediophyceae (Additional file 3; Table 1; Fig. 2), including S. marinoi, which is the only Skeletonema species whose full-length mtDNA has been constructed. To explore the evolutionary relationship between Skeletonema species and other diatom species, we constructed a phylogenetic tree using 31 PCGs that were shared by mtDNAs of Bacillariophyta and Oomycota, which were used as outgroup taxa (Fig. 3). All species in Bacillariophyta fell into three clades corresponding to three classes including Mediophyceae, Bacillariophyceae, and Coscinodiscophyceae (Fig. 3). In the class Mediophyceae, Skeletonema species clustered closely with Thalassiosira pseudonana, which was consistent with previous studies [17,45]. Both genera belong to the same Order Thalassiosirales [2]. Within Skeletonema, the S. marinoi mtDNA constructed in this study (MW438984) clustered closely with that of the other S. marinoi strain (JK029 strain from Korean seawater, NC_028615) as expected, so were the mtDNAs of two S. costatum strains (CNS00243 and CNS00303). S. tropicum clustered closely with S. pseudocostatum, consistent with previous report [3,24]. The mtDNA of S. grevillei, which was isolated from the South China Sea, was independent from other clades.

Synteny conservation with divergent regions in mitochondrial genome
The mtDNA sizes of these Skeletonema species showed some variations with the difference between the longest mtDNA (S. costatum, CNS00243 strain, 40,676 bp) and the shortest mtDNA (S. pseudocostatum, 36,199 bp) being 4477 bp. However, comparative analysis found that there was nearly perfect synteny conservation among Skeletonema species (Fig. 4A). This is expected because previous study found high synteny conservation between the mtDNAs of S. marinoi and T. pseudonana [43], which is diatom species of a different genus in the order Thalassiosirales. At the single gene level, the genome organization was also well conserved among the five Skeletonema species with only minor differences (Fig. 4B).
The numbers of open reading frames (orfs) and trnM genes were different among these Skeletonema mtDNAs, which were primarily responsible for the different lengths of mtDNAs. S. pseudocostatum and S. grevillei each had a single orf, S. marinoi and S. tropicum each had two orfs, while S. costatum each had three orfs. S. costatum each had an additional trnM. Compared with S. pseudocostatum, mtDNA of S. costatum had three more orfs (orf119, orf516 and orf248) and one more trnM, which in total accounted for 2730 bp. Another main reason for the big size difference between the two species was the length of intergenic regions. For example, the length of the combined intergenic regions in S. costatum (CNS00243 strain) was roughly 1745 bp longer than that of S. pseudocostatum. Interestingly, despite the deep conservation of these mtDNAs, comparative sequence analysis of these mtDNAs also revealed seven discrete regions with enhanced divergence, which were indicated using I, II, III, IV, V, VI and VII (Fig. 4B). Within these regions, DNA sequences were essentially unalignable (Additional file 5 and Additional file 6). Region I was located between trnW and trnM (Additional file 5A). The S. costatum mtDNAs have one big insertion (orf119) compared with other species, which were also validated by designed PCR primers (I-F and I-R in Additional file 7; Additional file 6). Insertions in mtDNAs of S. costatum (trnM and orf248) were also found in region V (located between atp6 and trnI; Additional file 5E and Additional file 6), which were validated by PCR primers (V-F and V-R in Additional file 7). In region III (located between nad2 and nad6; Additional file 5C and Additional file 6), the mtDNAs of S. marinoi (orf306), and S. costatum (orf516) had big insertions, respectively. In the region IV, only S. tropicum mtDNA had an insertion (orf387 between trnH and atp6) with 1786 bp length compared with other Skeletonema species with the lengths between 439 and 455 bp (Additional file 5D and Additional file 6). Divergent regions II, VI and VII, were located between nad11 and trnD (Additional file 5B and Additional file 6); between trnI and trnV (Fig. 5A-B; Additional file 5F); between trnV and nad5 (Additional file 5G), respectively. These divergent regions (Fig. 4B and Additional file 5) are generally located at the non-coding regions except the region V. The nature regarding the formation of these divergent regions remained unknown. However, these divergent regions may be used as potential molecular markers for distinguishing different Skeletonema species.
Among the seven regions, region VI could be used a molecular marker for distinguishing five Skeletonema species with both high resolution and high specificity. This region was different among five Skeletonema species, thus it could be used to distinguish different Skeletonema species (Fig. 5). To explore the specificity of the region VI as a molecular marker, primers (VI-F: ACTACTGACCTTACGCTTATC, VI-R: CGAAAAAG TTGGTGGTTCGATTCCA) were designed for PCR amplification of the region VI in 11 other phytoplankton species (i.e., Thalassiosira pseudonana, Chaetoceros muelleri, Heterosigma akashiwo, Aureococcus anophagefferens, Chattonella marina, Amphidinium carterae, Alexandrium tamarense, Karenia mikimotoi, Prorocentrum donghaiense, Isochrysis galbana, and Phaeocystis globosa). Agarose gel images of PCR products showed that only the PCR amplification was positive only for Chaetoceros muelleri with one bright band, whose length was much longer than that of Skeletonema species (Additional file 8), suggesting that the region VI can be used as a molecular marker to distinguish Skeletonema species with high specificity.

Divergence time estimation based on PCGs of mtDNAs
The time-calibrated phylogeny of the Mediophyceae and other species was constructed using DNA sequences of 34 PCGs from 15 mtDNAs. The divergence times of Mediophyceae species were estimated using the MCMC tree (Fig. 6). The divergence between the class Mediophyceae and the class Coscinodiscophyceae was inferred to have occurred at 183. 51

Compact Skeletonema mtDNAs
The six Skeletonema mtDNAs constructed in this study had relatively small sizes (Additional file 4), with short intergenic regions, no introns, and no repeat regions. Introns have been found in many diatom mtDNA genes (Additional file 3), which greatly increased the length of mtDNA. For example, the largest mitochondrial genome ever discovered was Halamphora calidilacuna [46], reaching 103,605 bp. The cox1 genes of H. calidilacuna spanned 31,716 bp, with only 1506 bp (4.7% of the total length of cox1 gene) coding sequences. Some repeats were also found in some mtDNAs, which induced larger sizes of mtDNAs. For example, mtDNA of Phaeodactylum tricornutum contained a 35 kb-long repeat [50]. The intergenic regions also contribute significantly to the whole length of mtDNAs, and the larger size of intergenic regions is usually associated with larger size of mtDNAs. Thus another reason for the small size mtDNAs among Skeletonema species was relatively short intergenic regions compared to other diatom species (Additional file 4).

Synteny conservation of Skeletonema mtDNAs
Our analysis found that Skeletonema mtDNAs contained similar numbers of genes containing 35 common PCGs, 2 rRNA genes and 25 tRNA genes (except in the S. costatum, which contained an additional trnM gene), same as the two mtDNAs of Thalassiosira (T. pseudonana and T. profunda) [47,48]. The Nitzschia mtDNAs also include 35 unique PCGs and 2 rRNA genes, but these mtDNAs only have 24 tRNA genes compared that in Thalassiosira with and Skeletonema genus [46,49,51,52]. Interesting, the mtDNAs in Halamphora genus show difference: H. calidilacuna mtDNA includes 34 PCGs and 26 tRNA genes; while H. coffeaeformis mtDNA includes 35 PGCs and 24 tRNA genes [46]. The mtDNAs of the two Halamphora species, which were quite different from that of the above three genera, include 3 rRNA genes and introns. While for the genetic structure, the mtDNAs belonging to the same genus showed highly synteny although the number of genes and genome sizes within the genus varied greatly (Additional file 9 and Fig. 4A). We also found slight variation among the Thalassiosira genus with two inversions (Additional file 9), showing the interspecific variability in the Thalassiosira genus. In brief, the mtDNAs of five Skeletonema species showed high conservation, which was in agreement with other species belonging to the same genus according to the previously reported mtDNAs.

Phylogeny and evolution of Skeletonema species
Diatoms arose in the Triassic period, perhaps as early as 250 Mya according to the molecular clock estimate [53], and provided a main ecological role in carbon cycle about 100 Mya, when the CO 2 level was five times higher than that nowadays. The timing coincided with the divergence of a second major lineage of diatoms, the bipolar and multipolar centrics, including Thalassiosirales clade [54]. Phylogenetic analysis of 31 core PCGs showed that Skeletonema was sister to Thalassiosira, both belonging to the order Thalassiosirales, consistent . We added two Bacillariophyceae species (Synedra acus and Fragilariopsis kerguelensis) to establish appropriate fossil calibration points. The red dots represent calibration points and the 95% highest posterior density interval for node ages are shown with translucent blue bars with the current classification [2,55]. Our results (Fig. 6) show that Skeletonema species diverged from Thalassiosira species approximately 48.30 Mya (95% HPD: 31.52-76.79 Mya) during the Eocene, a period included a major burst of diversification affected by sea level, silica availability and interspecific competition with other planktonic groups [56]. This divergence time was estimated using 34 mtDNA genes, which was slightly earlier than that estimated using the two nuclear genes, seven plastid genes and two mitochondrial genes [55,57], but was later than that estimated using two nuclear genes and two plastid genes [58], and using 18S rDNA [59].
Among the Skeletonema genus, the two HAB species S. marinoi and S. pseudocostatum diverged approximately 16.15 Mya (95% HPD 7.08-29.41 Mya), which was slight early than that results estimated previously using 18S rDNA [53]; and were later than that estimated using the two nuclear genes, seven plastid genes and two mitochondrial genes [55]. The divergence times estimated based on 34 PCGs of mtDNA among Skeletonema species in this study were generally consistent with previous results with minor fluctuations, which facilitating the understanding the evolution of Skeletonema species in the view of mitochondria with uniparental inheritance.

Conclusion
In this study, we reported mtDNAs of four Skeletonema species for the first time, and re-sequenced the S. marinoi mtDNA of a strain isolated in coastal waters in China. Through comparative analysis of the mtDNA structure, base composition and gene order, we found relatively conserved genetic arrangement among the Skeletonema species. We also reported discrete regions with high divergence among the mtDNAs of five Skeletonema species, which could potentially provide high resolution for the phylogenetic studies among the genus. We further reconstructed a phylogenetic tree, which was used to estimate the divergence times among Skeletonema species. However, more taxa and more complete mtDNAs data are needed to verify the evolutionary relationships and primers application.
The mtDNAs of five Skeletonema species showed similar sizes, conserved genetic structure and high syntenic relationships among the genus. While the five Skeletonema species showed different ecological geographical distribution (Additional file 10). For example, S. marinoi was easily found in seawaters of higher latitude with lower temperature (Additional file 10A); While S. tropicum was recorded in seawaters of lower latitude with higher temperature (Additional file 10B). We hypothesize that the nuclear genomes of the Skeletonema species may possess substantial differences, which facilitate the adaptation of different Skeletonema species.
To further explore the specific ecological distribution mechanism, whole genomes of Skeletonema species could help to provide new ideas.

Strain isolation and whole genome sequencing
Skeletonema strains studied in this project were isolated from water samples collected during six expeditions in Chinese coastal waters (Additional file 11), using methods as described previously [43]. The seawater of culture was collected from Jiaozhou Bay, filtered through a 0.22 μm mixed-fiber membrane and sterilized at 121°C for 30 min. The Skeletonema strains were maintained in L1 medium [60]. The culture salinity was 28-30 ‰. The culture temperature was 19 ± 1°C, and the irradiance was from 72 μmol photons /(m 2 ·s) with a 12:12 h lightdark cycle. The strains were maintained in our lab for 3 months to 13 months before sequencing.
The morphological features of the Skeletonema strains were observed using ZEISS Axio Imager 2 (ZEISS, Germany) at the 600× magnification. We selected reprehensive strains based on the preliminary morphological observation for Illumina sequencing. For DNA extraction, healthy cells at the exponential growth phase were harvested via centrifugation at 8000 rpm for 5 min. Total DNAs were extracted using DNAsecure Plant Kit (Tiangen Biotech, Beijing, China) using the manufacturers' instructions. DNA samples were quantified using Qubit® DNA Assay Kit in Qubit® 3.0 Flurometer (Invitrogen, USA), 0.2 μg genomic DNA of each sample was fragmented by sonication (Covaris S220, Covaris, USA) to the length of 350 bp. DNA fragments were harvested, end polished, A-tailed, and ligated with adapters for Illumina sequencing, followed by PCR amplification. The PCR products were purified using AMPure XP system (Beckman Coulter, Beverly, USA). Qualified libraries were sequenced using NovaSeq PE150 (Illumina, San Diego, CA, USA) at Novogene (Beijing, China).

Phylogenetic analysis and divergence time estimations
The Phylogenetic trees of cox1 (Fig. 1), full-length 18S rDNA, rbcL and 28S rDNA D1-D3 (Additional file 1; Additional file 2) were generated using the Maximum Likelihood (ML) method with 1000 bootstrap replicates in MEGAX [68]. Appropriate evolutionary models were selected using Model Selection.
Phylogenetic analysis and molecular dating were analyzed using 34 mitochondrial PCGs DNA sequences shared by Mediophyceae species and outgroups Melosira undulata. The 34 PCGs included atp6, 8, 9; cob; cox1-3; nad1-7, 4 L, 9, 11; rpl2, 5, 6, 14, 16; rps 3, 4, 7, 8, 10, 11, 12, 13, 14, 19; and tatA and tatC, which were shared among the 15 species. These PCGs were aligned (MAFF T with Codon mode) and concatenated (Concatenate Sequence). The best-fit evolutionary model and partitioning scheme were selected by PartitionFinder2, which were shown in the Additional file 13. The phylogenetic trees (MrBayes) was constructed using the software Phy-loSuite v1.2.2 [74]. Molecular dating was performed using the PAML package [75] in there three steps: (1) Rough estimation of substitution rate was analyzed using baseml; (2) Estimation of gradient and hessian of branch length; (3) Estimation of divergence times with the approximate likelihood method using the mcmctree. Two calibration points used in this analysis were shown in the Additional file 14. We added two Bacillariophyceae species (Synedra acus and Fragilariopsis kerguelensis) to establish appropriate fossil calibration points. The calibration point of Synedra and Fragilaria was based on fossil record [57]. The divergence times with fossilcalibrated deriving from ribosomal RNA and organelle genes, i.e. T. profunda and T. pseudonana [53,58,76]. The phylogenetic tree was displayed in the FigTree and visualized with 95% HPD interval for each node.
Synteny analysis and molecular marker development of six Skeletonema mtDNAs Synteny analysis of six Skeletonema mtDNAs sequences was carried out with the program Mauve v2.3.1 [77]. The region with great sequence differences were aligned using MUSCLE algorithm in the MEGAX. Six primers (I-F/R to VI-F/R in Additional file 7) for PCR amplification of molecular markers (region I to VI in Fig. 4B) were developed using the Primer Premier 5.0 [78]. PCRs were performed in a final volume of 50 μL containing 2 μL diluted template DNA (about 50 ng), 1 μL forward primer, 1 μL reverse primer, 25 μL mix (Tiangen, China) and 21 μL ddH 2 O. The reaction was denatured at 94°C for 4 min. Then, reactions were run for 32 cycles at 94°C for 1 min, 52°C for 1 min 50s, and 72°C for 2 min and a final extension at 72°C for 10 min. These PCR products were run on 1% agarose gels for checking amplicon lengths.