Dataset of foraminiferal sedimentary DNA (sedDNA) sequences from Svalbard.

Environmental DNA (eDNA) is usually defined as genetic material obtained directly from environmental samples, such as soil, water, or ice. Coupled to DNA metabarcoding, eDNA is a powerful tool in biodiversity assessments. Results from eDNA approach provided valuable insights to the studies of past and contemporary biodiversity in terrestrial and aquatic environments. However, the state and fate of eDNA are still investigated and the knowledge about the form of eDNA (i.e., extracellular vs. intracellular) or the DNA degradation under different environmental conditions is limited. Here, we tackle this issue by analyzing foraminiferal sedimentary DNA (sedDNA) from different size fractions of marine sediments: >500 µm, 500–100 µm, 100–63 µm, and < 63 µm. Surface sediment samples were collected at 15 sampling stations located in the Svalbard archipelago. Sequences of the foraminifera-specific 37f region were generated using Illumina technology. The presented data may be used as a reference for a wide range of eDNA-based studies, including biomonitoring and biodiversity assessments across time and space.


a b s t r a c t
Environmental DNA ( e DNA) is usually defined as genetic material obtained directly from environmental samples, such as soil, water, or ice. Coupled to DNA metabarcoding, e DNA is a powerful tool in biodiversity assessments. Results from eDNA approach provided valuable insights to the studies of past and contemporary biodiversity in terrestrial and aquatic environments. However, the state and fate of e DNA are still investigated and the knowledge about the form of e DNA (i.e., extracellular vs. intracellular) or the DNA degradation under different environmental conditions is limited. Here, we tackle this issue by analyzing foraminiferal sedimentary DNA ( sed DNA) from different size fractions of marine sediments: > 50 0 μm, 50 0-10 0 μm, 10 0-63 μm, and < 63 μm. Surface sediment samples were collected at 15 sampling stations located in the Svalbard archipelago. Sequences of the foraminifera-specific 37f region were generated using Illumina technology. The presented data may be used as a reference for a wide range of e DNA-based studies, including biomonitoring and biodiversity assessments across time and space. ©

Value of the data
• The data provides the first insight into the genetic diversity of Arctic foraminifera in different sediment size fractions. • Also, it's an overview of the spatial distribution of Arctic foraminifera inferred from sed DNA.
• This data may serve as a reference in a wide range of metagenomics-based studies, including biomonitoring, biodiversity surveys, and environmental impact assessment studies. • The data may be used also in the studies of past climatic and environmental changes.

Data description
The dataset contains foraminiferal sed DNA sequences from four sediments size fractions: > 50 0 μm, 50 0-10 0 μm, 10 0-63 μm, and < 63 μm. Sequences are clustered into the Operational Taxonomic Units (OTUs), and for each OTU, the number of sequence reads is presented. The data set can be accessed at Mendeley Data (doi: 10.17632/7kjkf8by5d.1). The samples were collected at 16 sampling stations collected from five localities in the Svalbard archipelago ( Fig. 1 ). Sampling stations coordinates and sampling depths can be found in Table 1 . The total number of OTUs recorded in each sampling location is presented in Fig. 2 . The number and percentage of OTUs and the percentage of DNA sequences found in the certain sediment size fractions are presented on the Venn diagrams ( Fig. 3 ).

Sampling
Surface sediment samples were collected with the use of box corer during the cruise of R/V Oceania in August 2016. The upper 2 cm of sediment has been sampled from the surface of approximately 25 cm 2 . Samples for sedimentary DNA ( sed DNA) analysis were wet sieved on a 50 0 μm, 10 0 μm, and 63 μm sieves. A fraction smaller than 63 μm was retained. Samples were transferred to sterile containers and froze in −20 °C.

sed DNA analysis
The DNA from sediment fractions 50 0 μm, 10 0 μm, and 63 μm was extracted from 0.25 g of bulk sediment with DNeasy PowerSoil Kit (Qiagen). Due to a large amount of sediment in < 63 μm fraction, DNA was extracted from 10 g of sediment using DNeasy PowerMax Soil Kit (Qiagen). The SSU DNA fragment including foraminifera-specific 37f hypervariable region [1] has been PCR amplified with the s14F1 (5 -XXXXXAAGGGCACCACAAGAACGC-3 ) and s15 (5 -XXXXXCCTATCACATAATCATGAAAG-3 ) primers tagged with unique sequences of 5 nucleotides appended at their 5 ends. For each sample, 3 PCR replicates were prepared. Amplicons were quantified with Qubit 3.0 fluorometer and the pool was purified with High PCR Cleanup Micro Kit (Roche). Library preparation was performed with TruSeq DNA PCR-Free LT Library Prep Kit (Illumina) and was loaded onto a MiSeq instrument for a paired-end HTS run of 2 × 150 cycles.

Post-sequencing data processing
Raw sequence data were processed according to [ 2 , 3 ]. The post-sequencing data processing was performed using the SLIM web app [4] and included demultiplexing the libraries, join-  ing the paired-end reads, chimera removal, Operational Taxonomic Units (OTUs) clustering, and taxonomic assignment. Sequences were clustered into OTUs using the Swarm module [5] and each OTU was assigned to the highest possible taxonomic level using vsearch [6] against a local database of foraminiferal SSU DNA sequences. The results were presented as OTUs-to-samples tables.