Influence of storage time on the stability of diatom assemblages using DNA from riverine biofilm samples

DNA sequencing of diatom assemblages from biofilms has already been used to assess the ecological status of freshwater in the UK. However, recent work using DNA data from these biofilms suggests that alternate metrics that capture the broader taxonomic and functional information to demonstrate importance of microbial biofilms could be useful. Exploring this potential requires large numbers of samples over time and space to be ana - lysed. Sample archives could be used to meet this need, but the compositional stability of microbial communities in stored biofilm samples for more than one year is uncertain. This study compared changes in diatom assemblage structure using metabarcoding analysis of river biofilm samples before and after storage at -20 °C in an RNAlater-based nucleic acid preservative. We found minimal changes in the diatom assemblages in the samples when stored for up to three years. Slight differences in certain groups observed resulted in four samples changing ecological status. However, the overall differences were not significant across replicates, suggesting any genuine differences in assemblages are likely masked by sub-sampling, PCR, or primer biases. These findings are similar to those observed in other studies looking at variations between analysts and sequencing instruments. This indicates that the diatom assemblages in the archived biofilm samples are stable. This will give greater confidence that archived samples can be used for further research, including exploring broader microbial taxa and their responses to environmental change, potentially leading to the development of reliable microbial metrics for integration into biomonitoring programs


Introduction
In the United Kingdom, environmental regulators, including the Environment Agency of England, employ various methods using biological indicators to assess the ecological status of lakes, rivers, and estuaries.One method involves the use of diatoms from biofilms in rivers and lakes as proxies for wider phytobenthos (Kelly et al. 2008).This assessment utilises Ecological Quality Ratio (EQR) metrics (European Parliment 2000) based on the Trophic Diatom Index (TDI) to assess ecological status.Classifications are determined based on the evaluation of the diatom assemblage composition using either light microscopy or high throughput sequencing (Kelly et al. 2008;Kelly et al. 2020).
In addition to the diatom assemblage, biofilms support a diverse microbial community embedded within a slimy matrix, which promotes their growth and survival (Neu and Lawrence 1997).Biofilm communities include bacteria, fungi, a diversity of algae, and protozoa that all contribute to important ecosystem processes, such as primary productivity, decomposition, biogeochemical cycling, and pollutant degradation (Falkowski et al. 2008;McGuire and Treseder 2010;Mishra et al. 2021).In many European countries, microbial bioindicators have largely been restricted to diatoms, phytoplankton, and targeted bacterial groups, such as faecal indicator bacteria, for routine ecological assessments under the Water Framework Directive (European Parliment 2000) and the Bathing Water Directive (European Parliment 2006).Whilst there is recognition that a much more diverse microbial community exists within river biofilms, we are not fully exploiting this information to build new indices and metrics (Environment Agency 2023; Kelly et al. 2024).
The integration of microbial diversity and functional indicators into biomonitoring for the assessment of anthropogenic pressures has been advocated for many years, and has recently gained momentum (Jackson et al. 2016;Cordier et al. 2019;Sagova-Mareckova et al. 2021;Warnasuriya et al. 2023).There is a good foundation for the exploration of new diagnostic microbial metrics and indices (Sagova-Mareckova et al. 2021), enabled by advances in high-throughput sequencing and computational approaches, such as machine learning (Cordier et al. 2019;McElhinney et al. 2022) and network analyses (Codello et al. 2022;Guseva et al. 2022).Such approaches allow the interrogation of relationships between microbial communities, their attributes, and anthropogenic pressures, facilitating a greater understanding of microbiomes and their responses to environmental change (e.g., Deutschmann et al. 2021;Eastwood et al. 2023).
Currently, data and models are needed to identify reliable microbial bioindicators (Fontaine et al. 2023).Large spatial and temporally relevant microbial datasets are required to mine and identify candidate 'features' of microbial communities that have the potential to be developed and upscaled more widely as bioindicators of ecosystem function and predictors of change (Astudillo-García et al. 2019).Generating such datasets is costly due to the expense associated with extensive field sampling and analysis.Thousands of biofilm samples are collected across England's river network as part of routine ecological assessments using diatoms.This presents a valuable archived resource to further explore wider microbial bioindicators, as samples are stored in a nucleic acid preservative and frozen at -20 °C (Kelly et al. 2020).A previous study by Baricevic et al. (2022) showed that preserved biofilm samples stored at -20 °C remained stable for 12 months.No significant impact was observed on DNA quality, yield, and the overall composition of phytobenthic diatom assemblages that reflected the site origin.However, the stability of archived biofilm samples over longer timescales remains unknown.With multiple years now in storage across thousands of sites river biofilm samples archived by the Environment Agency offer a unique opportunity to explore the spatiotemporal variability of the river microbiome across England in a cost-effective manner.
This study aimed to assess the stability of river biofilm samples for up to three years of freezer storage in preservative and used diatom assemblages as a proxy for the integrity of the overall microbial species in the biofilms.We assessed the suitability of reusing existing samples for the generation of large DNA datasets, which could enable the characterisation of the microbial response to environmental change.

Methods
To assess the impact of storage time on diatom assemblages, samples collected and analysed in 2019 and 2020 for routine analysis that had been stored concentrated in preservative were re-analysed in December 2022.The reanalysis was performed twice so that differences due to storage could be disentangled from the expected stochastic differences due to sub-sampling.

Sample collection and selection
River biofilm samples collected in 2019 (n=50) and 2020 (n=14) as part of the Environment Agency's routine monitoring of diatoms in rivers for ecological status assessments were used in this study.Samples were collected between April-November and analysed between September-January of the corresponding year.Samples were selected for reanalysis based on the total number of reads regardless of sequence taxonomic assignment passing quality control for routine analysis (>50,000 reads), and to cover a broad spread of geographic regions across England.A total of 64 frozen biofilm samples were selected for re-analysis in late 2022, following storage in preservative for 2 or 3 years after the initial analysis (Fig. 1).
Sample collection was performed as previously described in Kelly et al. (2020).Briefly, biofilm-covered stones were collected in a tray and scrubbed with deionised or tap water using a clean toothbrush.Using a pipette, 5 ml of the biofilm suspension was transferred to a 15 mL tube containing RNAlater-based preservative (3.5 M ammonium sulphate, 17 mM sodium citrate, and 13 mM Ethylenediaminetetraacetic acid), transported to the laboratory via an overnight courier at 5±3 °C, and stored frozen at -20±5 °C prior to DNA extraction.

DNA extraction and diatom assemblage metabarcoding
All samples were processed, according to Kelly et al. (2020).Samples were thawed and then centrifuged at 3000× g for 15 ± 2 min at 5±2 °C to form a concentrated biofilm pellet.The pellet was resuspended in 0.5 mL of the supernatant and the rest discarded.DNA from 0.1 mL of the resuspended pellet was extracted using the Qiagen blood tissue kit (#69506) with an extended overnight lysis step in a rocking incubator at 56±4 °C.The target rbcL region was amplified by PCR using rbcL-646F (5'-ATGCGTTGGAGAGARCGTTTC-3') and rbcL-998R (5'-GATCACCTTCTAATTTACCWACAACTG-3') to generate an amplicon library for each sample.While these primers are used for the routine assessment of diatom assemblages, they also amplify other non-diatom phytobenthos (Kelly et al. 2024).The amplicon libraries were purified using solid-phase reversible immobilisation (SPRI) beads.After cleaning, the libraries were tagged using Illumina Nextera XT unique dual-indexed adapters (#20091654) and purified.Tagged and purified amplicon libraries for each sample were normalised and pooled using molecular grade water and sequenced on an Illumina MiSeq instrument using a 600 cycle V3 reagents kit (#MS-102-3003).This original analysis (OG) forms the first sub-sample group of this study.The complete PCR conditions and clean-up procedures are detailed in Suppl.material 1.The remaining concentrated biofilm sample was stored at -20±5 °C in RNAlater-based preservative.
After storage, the samples were thawed for reanalysis by aliquoting two 0.1 mL sub-samples (RA and RB; Fig. 2) and processed following the same method described above.In total, 192 samples were sequenced across the three sub-sample groups.

Bioinformatic analysis
Raw sequence reads were imported into the QIIME2 environment (v2022.8.3) (Bolyen et al. 2019).Primer sequences were removed using cutadapt, discarding reads with no matching primer sequences (Martin 2011).DADA2 was used to denoise, dereplicate, and remove chimeras on a per run basis table (Callahan et al. 2016).Data were combined into a single set of amplicon sequence variants (ASVs) and an abundance.Taxonomy was assigned to ASVs using QIIME2's scikit-learn multinomial naïve Bayes classifier against the diat.barcode reference database v11.1 (2022) with a 95% confidence threshold (Pedregosa et al. 2011;Bokulich et al. 2018;Rimet et al. 2019).After taxonomic assignment, data was not filtered so both diatom and other non-target phytobenthos were included in all analyses.

Statistical analysis
ASV abundance and taxonomy data were imported into R (v4.1.2) using qiime2R (0.99.6) and phyloseq (v1.38.0), and all data was visualised using ggplot2 (v3.), but sequencing depth and diversity were investigated using rarefaction curves with the rarecurve function in vegan (v2.6-4) (Oksanen et al. 2022).For sub-samples in which the rarefaction curve did not plateau at the final read depth, all replicates of that sample were removed from the rest of the analysis.To compare trends in diversity across the dataset, distances were calculated at the ASV level using the Bray-Curtis distance measure.Bray-Curtis distances were used to account for variation in ASV detection and abundance between and within samples.Subsequently, distances were used to evaluate the influence of sample storage using permutational multivariate analysis of variance (PERMANOVA) with the adonis2 function from vegan (Oksanen et al. 2022).The presence and absence of taxa were compared at both genus and species levels and visualised using ggvenn.The differential abundance of taxa was assessed using DESeq2 (v1.34.0), and relevant differences were visualised using ggplot and gghighlight (Love et al. 2014).The TDI, EQR, and classifications were calculated for each sample using darleq3 (v0.9.8) to assess the impact on ecological status classification (Kelly et al. 2020) and visualised using ggally.
The raw sequence files were deposited at the European Nucleotide Archive under the accession number PRJEB76460.

Sample quality control
Two sub-samples failed quality control due to poor read depth, causing the rarefaction curves not to plateau, and as a result, all sub-samples for that sample were discarded.In total, 186 sub-samples from 62 samples passed quality control and were used for statistical analysis.
Across all samples, 12.9 million reads passed quality control.The number of reads per sample ranged from 4,980 to 258,229, with a median frequency of 64,151.A total of 5,138 unique ASVs were detected, and all were assumed to be phytobenthic algae; of these, 1,403 could be assigned to the species level, accounting for 59.3% of all reads, and a further 2,358 to the genus level, accounting for 80.8% of all reads in total, making these suitable levels with which to assess taxon detection (Table 1).

Impact of storage on diatom assemblages and taxon detection
At the ASV level, differences in diatom assemblages were mostly due to differences in the sample sites (PERMANOVA, R 2 =0.383, p=0.001).Differences in storage conditions between the original and repeated sub-samples accounted for < 1% of the difference in diatom assemblages and were not statistically significant (PERMANOVA, R 2 =0.005, p=0.212, full model output in Suppl.material 2).The abundance of taxa at the genus and species levels was not significantly different between the original and post-storage sub-samples.At a log2fold change no species or genera were detected at a significantly higher abundance post-storage (Fig. 3).
Most genera were detected in all replicate groups (84.0%;Fig. 4).However, seven genera were detected unevenly between the original and post-storage samples.All seven genera were detected only after storage and at low levels of detection (<1000 reads) across very few samples (1-3), whereas the original analysis had no unique genera (Table 2).
Similarly, at the species level 69.7% of species were detected in all replicate groups.However, 11 species were uniquely detected in the original samples and 24 were detected in the post-storage (RA and/or RB Fig. 4).Of the 35 species, all but three belonged to genera that were more widely detected (Suppl.material 4).Surirella solea and Placoneis constans were the only diatom species that did not belong to a more widely detected genus but were detected in the post-storage replicates only.In addition, the non-target species Quercus robur (common name: English Oak) was detected in three post-storage replicates only.For most of the species which were differentially detected, overall reads were low (<1000 reads) and were detected across very few samples (1-3).Pinnularia viridis was the only species detected at higher numbers (>1000), although this was possibly inflated by detection in a sample with an above-average read depth (181,687).Points to the right denote taxa found more abundantly in the original analysis, and points to the left denote taxa more abundant in the repeat analysis.Similar pairwise comparisons were made between all three groups (OG, RA, and RB), which also showed no significant difference (see Suppl.material 3).

Metric differences
TDI values generated from the diatom assemblages representative of the original and post-storage samples were generally similar.The mean difference in TDI before and after storage was 3.18, st.dev=7.33 (OG vs RB: 4.03, st.dev=9.22).The TDI values of all three replicate groups were highly correlated and statistically significant (Pearson's correlation coefficients ranged between 0.892 and 0.980; p <0.001 for all comparisons) (Fig. 5).
When assigning ecological status classifications using TDI scores, 75.8% of samples were assigned to the same class across all replicates (79.0% of samples were the same or one class different).When comparing any two of the replicate groups, the OG and RA groups had the highest agreement, with 90.4% of samples assigned to the same class, whereas RA and RB had the highest agreement at the same class or one class difference at 98.4% (see Table 3).
When comparing the TDI and the derived ecological status classifications, there were four clear outliers: S09, S43, S51, and S55.In one instance, RB (S09) was an outlier, and in the other three instances, the OG replicate was an outlier.
The relative abundance of diatoms between outlier sample replicates was compared using bar plots at the order level (Suppl.material 5).In sample 09, the RB replicate contained a much larger proportion of Amphora pediculus (47.1%; displayed as part of order Thalassiophysales in supplementary materials bar plot) than the OG and RA replicates (22.2% and 17.5%, respectively); otherwise, the communities appeared similar.However, in the other three outlier samples, the differences in community compositions were more apparent compared to non-outliers (See Suppl. materials 5,6).

Discussion
This study building on the work of Baricevic et al. (2022) improves our understanding of the impact of long-term storage on diatom assemblages.We examined samples frozen at -20 °C for up to three years in nucleic acid preservative

B А
to assess the impact of storage on DNA recovery and to understand whether diatom assemblages exhibited any significant change in their composition.Diatom assemblages were used as a proxy for inferring impacts on the wider microbiome because they are the only microbial component of the biofilm currently assessed as part of the Environment Agency's routine monitoring program that has metabarcoding data to benchmark storage impacts.Archived samples collected as part of large routine monitoring programmes provide Table 3. Matrix showing differences in the number of samples assigned to each ecological status class (bad, poor, moderate, good, high) between replicates.The samples assigned to the same class by two replicate tests are highlighted in green.Samples that were different by one class are highlighted in yellow.Samples in which the difference between replicates is greater than one class are highlighted in bold.unique opportunities to generate large comprehensive spatial and temporal datasets cost effectively to facilitate the development of new diagnostic metrics and indices.
The results of this study show that the storage of biofilm samples for up to 3 years in preservative frozen at -20 °C resulted in no significant difference in diatom assemblage diversity at the ASV level (PERMANOVA, R 2 = 0.005, p=0.212), differences in 2 and 3 years of storage time were also compared and were equally insignificant (PERMANOVA, R 2 = 0.011, p=0.216).Similar findings were found by Baricevic et al. (2022), where storage conditions of up to one year were insignificant.In both of these studies, it is likely that some of the small statistically insignificant differences were due to variability arising from analytical repeats; differences in diatom diversity at the ASV level observed in this study were similar to those reported by Kelly et al. (2018) when comparing differences observed between different analysts and different sequencing instruments.
Other studies have compared other microbial groups and storage conditions in similarly dense microbial sample types, reporting varying differences in the impact on the observed community.Tap et al. (2019) observed minimal effects of storage time on bacterial communities in faecal samples.Communities from samples stored for 5 years in preservative at -80 °C were comparable to the reference samples.Other studies have investigated the impact of storage conditions on bacterial communities in faecal material at shorter timescales, with similar findings (Bundgaard-Nielsen et al. 2018;Dully et al. 2021;Kim et al. 2023). However, Delavaux et al. (2020) found that the use of RNAlater significantly affected bacterial communities compared to samples frozen in liquid nitrogen and stored at -80 °C, but only at one of the two sites tested.They also investigated the impact on fungal communities but found no significant differences due to any of the storage conditions tested.
In the present study, when comparing individual taxonomic groups across replicate groups, there were no significant differences.At the species and genus levels, there were no statistically significant differences in the relative abundance.Similarly, although there was uneven detection of some taxa at the species and genus levels between replicates, these were only observed at low levels and in a few samples.No other similar studies compared detection but Tap et al. (2019) found no difference on the relative abundance of dominant bacterial taxa, and analysis using DESeq2 found that there were no taxa at the OTU level that were significantly different.
When comparing ecological status metrics, this study found highly significant and strong correlations in TDI values between replicates of the same original sample (Pearson's correlation coefficients ranged from 0.892 to 0.980 with all comparisons p<0.0001) which is similar to the trends observed in specific pollution-sensitivity index (SPI) values by Baricevic et al. (2022), with small, insignificant differences between samples from the same site regardless of the time stored or preservation technique.Unlike Baricevic et al. (2022) who found no differences in Specific Pollution-sensitivity Index classes at the six sites compared, this study found differences in assigned water quality class.Four samples were assigned a greater than one class difference, although the difference was dependent on which replicates were compared.As no species were differentially abundant between replicate groups across all samples, this suggests that the differences in these outlier replicates are not likely due to fundamental differences caused by sample storage.To elucidate the differences within these outlier samples further replication would be required during the original analysis.
We speculate that the cause of these insignificant but observed differences in sub-sample assemblages and at the taxon levels are due to differences caused by sub-sampling of the original biofilm samples and/or stochastic variation in the communities exacerbated by PCR amplification.Subtle differences caused by sample and PCR variations have been widely reported in the literature and are common caveats of routine monitoring data (Mathieu et al. 2020;Shirazi et al. 2021;Gold et al. 2023).We have come to this conclusion as there was no significant difference in the relative abundance of taxa in samples after storage, and differences in detection of taxa only found in original or repeated analyses were typically in taxa that were detected at low levels and across few samples, and detection of these rare taxa were more likely impacted by sub-sampling.All but two diatom species (Surirella solea and Placoneis constans) with different levels of detection across replicates belonged to genera that were more widely detected across replicate groups, suggesting that the observed differences in composition were unlikely to be caused by the breakdown of cells or DNA or influenced by the evolutionary relationships (phylogenetic differences) among the species.
Overall, our observed (insignificant) findings suggest that any differences in diatom assemblages are likely masked by variations in the assemblages due to sub-sampling, inter-analyst, and inter-instrument biases, all of which existed between the original and replicate analyses.As a result, diatom assemblages from biofilm samples stored frozen in nucleic acid preservative were not affected by storage for up to three years and are suitable for use in further research.We do, however, suggest caution when extrapolating the results to a wider microbial community because research on the stability of bacterial communities by 16S metabarcoding is contradictory and difficult to compare to the samples used in this study due to differences in sample type and storage conditions (Song et al. 2016;Tap et al. 2019;Delavaux et al. 2020).Little research has been conducted on the impact of storage conditions on fungi or protists (Delavaux et al. 2020).The extent to which DNA from other microbial groups may have degraded in biofilms frozen in preservatives is unknown and should be considered when interpreting the analysis of any big data generated using these archived samples.
This study has further evidenced RNAlater-based preservation of freshwater biofilms, as an alternative to ethanol, one of the standard recommended methods (CEN 2018).Whilst recognising the importance of standardisation to ensure high-quality and comparable data, standardised methods still need to be pragmatic and cost effective in order to ensure uptake by regulatory bodies.For logistic, and health and safety reasons the Environment Agency has restrictions on the use of ethanol, and therefore sought alternatives for the sampling and storage of biofilm samples (Kelly et al. 2018).The results of this study extend the utility of historic raw biofilm samples stored in nucleic acid preservative allowing regulators to maximise their value, by reusing samples for other purposes such as monitoring other non-microbial taxonomic groups (Rivera et al. 2023).

Conclusion
This study determined the stability of diatom assemblages in biofilm samples analysed before and after storage in a preservative for up to three years using metabarcoding.Differences in diatom assemblages in samples before and after storage were minimal and likely due to bias in sub-sampling of the original samples, as taxa varied equally before and after storage, and (insignificant) differences in beta-diversity were similar to those previously observed when assessing between-analyst and between-instrument variation.This suggests that the diatom assemblages are well preserved within biofilm samples up to 3 years old if stored as pellets in a preservative at -20 °C.

Figure 1 .
Figure 1.Map showing the geographic spread of biofilm sampling points across England.

Figure 2 .
Figure 2. Experimental design: samples were collected and processed in 2019 and 2020 (OG; n=50,14).Post-storage samples were reanalysed in 2022 on two replicate sub-samples from each archived sample (RA and RB).Note that 2 samples originally taken in 2019 from group RB failed quality control and were removed from the study.

Figure 3 .
Figure3.Volcano plot showing the fold change in abundance of the genus and species between the original and RA sample analyses.The horizontal dotted line denotes significance (p <0.05), and the vertical lines denote the magnitude of difference, where the lines are set at the equivalent of a 4-fold difference.Points to the right denote taxa found more abundantly in the original analysis, and points to the left denote taxa more abundant in the repeat analysis.Similar pairwise comparisons were made between all three groups (OG, RA, and RB), which also showed no significant difference (see Suppl.material 3).

Figure 4 .
Figure 4. Venn diagrams of genera (A) and species (B) common and unique to each sample type.OG is the original, and RA and RB are sub-samples analysed post-storage.

Figure 5 .
Figure 5. Correlogram of TDI scores showing differences between sample types and Pearson's correlation between scores.All Pearson correlations were highly significant (p<0.001).Histograms show the distribution of TDI scores across replicates.

Table 1 .
Number of reads that the taxonomy was assigned to at each rank.

Table 2 .
Uniquely detected genera between sample replicates.Numbers in parentheses represent the number of samples in which the genera were detected.