Recovering sedimentary ancient DNA of harmful dinoflagellates accumulated over the last 9000 years off Eastern Tasmania, Australia

Abstract Harmful algal blooms (HABs) have had significant adverse impacts on the seafood industry along the Tasmanian east coast over the past 4 decades. To investigate the history of regional HABs, we performed analyses of sedimentary ancient DNA (sedaDNA) in coastal sediments up to ~9000 years old collected inshore and offshore of Maria Island, Tasmania. We used metagenomic shotgun sequencing and a hybridisation capture array (“HABbaits1”) to target three harmful dinoflagellate genera, Alexandrium, Gymnodinium, and Noctiluca. Bioinformatic and DNA damage analyses verified the authenticity of the sedaDNA sequences. Our results show that dinoflagellates of Alexandrium genera have been present off eastern Tasmania during the last ~8300 years, and we sporadically detected and unambiguously verified sequences of Gymnodinium catenatum that were present offshore up to ~7600 years ago. We also recovered sedaDNA of the fragile, soft-bodied Noctiluca scintillans with increased relative abundance since 2010, consistent with plankton surveys. This study enabled us to identify challenges of sedaDNA sequence validation (in particular for G. catenatum, a microreticulate gymnodinoid species) and provided guidance for the development of tools to monitor past and present HAB species and improvement of future HAB event predictions.

Supplementary Material Note 2: Sedimentary ancient DNA -Shotgun rarefied Normalising the shotgun data (i.e., rarefying/subsampling to 2.2 Mio reads per sample) reduced the number of reads assigned to Dinophyceae to a total of 148.As expected, rarefication led to the detection of less reads of harmful Dinophyceae taxa, with Gymnodinium only resolved to genus level in MCS3-T2 and GC2S1, and only one Alexandrium sequence detected in MCS3-T2 (Supplementary Material Fig. 2).In GC2S1 the read numbers showed some cyclicity throughout the core, suggesting relatively high Dinophyceae abundance (up to 10 reads per sample) between 189 and 240.5 cmbsf (~7,100 -8,300 years ago), and 50 -56.5 cmbsf (2,300 -2,600 years ago) and lower read numbers in between (Supplementary Material Fig. 2).This coincided with slightly elevated Gymnodinium spp.read numbers at 239 cmbsf (8,300 years ago) and 50 cmbsf (~2,300 years ago), although this should be interpreted with caution due to low read numbers (Supplementary Material Fig. 2).
Supplementary Material Note 3: Alexandrium spp., Gymnodinium spp.and Noctiluca scintillans assignment specificity testing To test the reliability of assignments made to Alexandrium spp., Gymnodinium spp., and N. scintillans, we created three dummy samples (fasta-files) containing SSU, LSU, and ITS-5.G. microreticulatum,G. nolleri,G. trapeziforme,G. inusitatum,A.tamarense Group 1 (A.catenella), A. tamarense Group 2 (A.mediterraneum), A. tamarense Group 3 (A.tamarense), A. tamarense Group 4 (A.pacificum), A. australiense, A. catenella, and N. scintillans using sequences downloaded from NCBI (as available on 16.10.2020for Gymnodinium spp., which we tested first, and as available on 23.07.2021 for Alexandrium spp.and N. scintillans).Each sequence was included with its complete length (Supplementary Material Table 3) and split (cut) into 56 bp fragments (corresponding to the average sequence length of our filtered Shotgun data).If the last fragment was <25 bp it was excluded from the dummy sample (mimicking our minimum cut-off of 25 bp during sample data processing, see main text).We converted from .fasta to .fastq for each Dummyfile separately using the reformat option in BBMap (version 36.62-intel-2017.01-Java-1.8.0_121, command: reformat.shin=DummyFilename.fastaout1=DummyFilename_R1.fastqout2=DummyFilename_R2.fastqqfake=25 fastareadlen=129 qout=64 addcolon=t trimreaddescription=t uniquenames=t), compressed each of the resulting fastq-files (gzip) and aligned against the NCBI Nucleotide database using MALT (Herbig et al., 2016) (see main text).Additionally, we combined the fastq-files of our dummy sample with a filtered Shotgun sample that had no assigned Gymnodinium spp., Alexandrium spp. or N. scintillans (GC2S1 5 -6.5 cm), and re-run the alignment.The latter was performed to assess the accuracy of assignments of the dummy sequences within a complex sequence background.Our results show that all assignments of our Alexandrium spp.dummy sample were made to Alexandrium on genus level (Supplementary Material Fig. 3A).This was independent of the gene used, whereby none of the A. catenella COI sequences (GQ501122.1)were assigned.One sequence each was misassigned to Apicomplexa and Apocrita (Supplementary Material Fig. 3A), and in both cases these were 56 bp fragments of the 18S rRNA gene, partial sequence (fragment 16 and fragment 10 of KF908802.1 A. australiense, respectively).These results show that Alexandrium sequences of the SSU, LSU, ITS1, ITS2, and SxtA could not confidently be assigned to species level, thus metagenomic sequencing data containing short reads for this genus should be interpreted on genus level.

Supplementary Material
One read each was misassigned to Apicomplexa and Apocrita, and in both cases these were short 56 bp 18S rRNA fragments (KF908802.1 A. australiense fragment 16, and KF908802.1 A. australiense fragment 10, respectively).The fragment misassigned to Apocrita was similar to a short 18S rRNA fragment of N. scintillans that was misassigned to the same Apocrita NCBI reference sequence (>AF436016.1|tax|177228|Lepidopa californica 18S ribosomal RNA gene, partial sequence, Length = 1696, see N. scintillans section below).This means that when Apocrita are identified alongside Alexandrium spp. in metagenomic data containing short reads based on 18S rRNA, the presence of Apocrita should be interpreted with caution.

Gymnodinium species
Not all sequences were assigned to species level but instead to higher taxonomic levels (such as Dinophyceae, Gymnodinium, Supplementary Material Fig. 3B).All assignments of Gymnodinium sequences that were achieved at species level were correct, except the 18S rRNA sequence included for G. catenatum (AF022193.1).Neither the complete original sequence nor any of its 56 bp fragments was assigned to G. catenatum, instead, the complete sequence was assigned to "Gymnodiniaceae", while the fragments were distributed across the higher taxonomic levels as well as misassigned to Durinskia baltica, Eimeria and Streptophytina (Supplementary Material Fig. 3B).This indicates that 18S rRNA sequences of G. catenatum cannot confidently be assigned to species level.Two fragments of the 18S rRNA sequence of G. microreticulatum were erroneously assigned to Symbiodinium sp.AW-2009 and Membracoidea, respectively (Supplementary Material Fig. 3B).The sequence that was misassigned to Symbiodinium was misassigned to the exact same NCBI reference sequence as one N. scintillans dummy sequence (FN552048.1|tax|675000|Symbiodinium sp.AW-2009 partial 18S rRNA gene, isolate 1, Length = 872).
Our results suggest that assignments of Gymnodinoids to species level based on 18S rRNA sequences should be interpreted with caution, and, in particular, also the simultaneous detection of Symbiodinium (see N. scintillans below).

Noctiluca scintillans
Most sequences included in our N. scintillans dummy sample were correctly assigned to this species, and mostly correctly back-mapped to the exact NCBI sequence we used to create our dummy sample (GQ380592.1).Some reads were assigned to a different N. scintillans 18S rRNA reference sequence available on NCBI (>AF022200.1|tax|2966|Noctiluca scintillans 18S ribosomal RNA gene, complete sequence), however, the species-level identification was correct (Supplementary Material Fig. 3C).
One read each was misassigned to Symbiodinium and Apocrita (short 56 bp 18S rRNA fragments 9 and 11, respectively).The fragment that was misassigned to Symbiodinium aligned to the same NCBI reference sequence (FN552048.1|tax|675000|Symbiodinium sp.AW-2009 partial 18S rRNA gene, isolate 1, Length = 872) as the fragment misassigned to Symbiodinium in our Gymnodinium dummy sample (see above).The alignments were between position 477 and 422, and position 469 and 414 of the Symbiodinium reference sequence FN552048.1 for the Noctiluca and Gymnodinium dummy sample fragment, respectively (i.e., nearly the same positions).Similarly, the fragment that was misassigned to Apocrita in our Noctiluca dummy sample aligned to the same NCBI Apocrita reference sequence as the fragment misassigned to Apocrita in the Alexandrium dummy run (NCBI reference sequence AF436016.1|tax|177228|Lepidopa californica 18S ribosomal RNA gene, partial sequence, Length = 1696).Again, the alignments of our short Noctiluca and Alexandrium dummy sample fragments were at very similar positions of the Symbiodinium sequence FN552048.1 (between position 1132 and 1077, and 1145 and 1090 for the N. scintillans and Alexandrium dummy fragment, respectively).This shows that N. scintillans can be accurately identified on species level using short (56 bp) sequences, however, short regions within the 18S rRNA can lead to misidentification of Symbiodinium and Aprocrita, meaning that if these latter two taxa are identified in ancient metagenomic data, the results should be interpreted with caution.The same is the case when Symbiodinium is identified alongside Gymnodinium spp., and Apocrita alongside Alexandrium spp.
We then mapped all gymnodinoid sedaDNA sequences, new G. nolleri, G. microreticulatum D3-D10 sequences, and existing near full LSU-rRNA gene sequences of G. impudicum (DQ779993), to G. catenatum (DQ785882) as a reference sequence.Reference alignment and mapping used alignment/assembly algorithms implemented in the package Geneious Prime 2021 (default overlaps and word length, gaps allowed; Supplementary Material Fig. 4).As anticipated from the HABbaits1 design (Armbrecht et al., 2021), the majority (99%) mapped to the rRNA genes; the LSU-rRNA gene (81%), the rDNA-ITS region (14%) and the SSU-rRNA gene (5%) (Supplementary Material Table 3).Three sequences were assigned to cytochrome oxidase (CO1), a target also included in HABbaits1.While correctly assigned to other gymnodinoid species/genera, we excluded these sequences from further comparison.Of the reads assigned to rRNA genes, 68% were within 3 bp of exact match for G. catenatum reference sequence (Supplementary Material Fig. 5; Table 3) More than 97% of reads could be unambiguously assigned to microreticulate Gymnodinium species.More than half (56%) of reads were closest, or equal closest match to G. catenatum reference sequence, with 36% unambiguously assigned to G. catenatum, and 12% to G. microreticulatum (Supplementary Material Table 3).Approx 21% of reads mapped to rRNA regions with insufficient sequence variation to clearly resolve G. catenatum from closest relative G. nolleri; and 30% to regions unable to distinguish among the five microreticulate-group species (Supplementary Material Table 3).

Supplementary Material
Slight differences in Alexandrium, Gymnodinium and Noctiluca sedaDNA read lengths were found when compared to overall Dinophyceae read lengths in the shotgun and HABbaits1 data (Supplementary Material Fig. 7).Gymnodinium sequence length seemed to be slightly longer in comparison to the overall Dinophyceae sequences, which is consistent with this species being a good cyst-former, and which may benefit its sedaDNA preservation over time.However, due to very low number of reads recovered in the shotgun data, and high standard deviations of average fragment lengths per taxon in both shotgun and HAbbaits1 datasets, we do not consider this read-length data statistically robust.It is provided here for completeness.

Figure 4 :
Mapping of Gymnodinium catenatum sedaDNA reads to reference sequences.Example of fragment mapping (middle section of LSU-rRNA shown) of sedaDNA fragments to G. catenatum DQ785882 (reference sequence), related Gymnodinium sequences, and extended LSU-rRNA sequences of Gymnodinium microreticulatum and G. nolleri (Reference alignment, default parameters; Geneious Prime 2021).

Table 3 :
Gymnodinium sedaDNA verification.Gymnodinium species assignments are based on closest sequence match between sedaDNA fragments after alignment to reference rRNA gene sequences of five known microreticulate-group species and G. impudicum.Also provided are the number of base mis-matches of Gymnodinium sedaDNA to G. catenatum reference sequence DQ785882.