Composition and distribution of diazotrophs in the Baltic Sea

heterocyst-forming cyanobacteria accounted for almost 80% of the nifH transcripts. Salinity had a minor influence on the composition, but Aphanizomenon and Nodularia showed increased relative nifH gene expression at low and higher salinity, respectively. Pseudanabaena only accounted for up to 5% of the nifH transcripts and nifH gene expression by Candidatus Atelocyanobacterium thalassa (sublineage UCYN-A2) was mainly observed in the most saline western part of the Baltic. The only notable expression by NCDs (up to 15% of nifH transcripts at a given station) coincided with an upwelling event at the southern coast and was largely accounted for by a Pseudomonas -like nifH phylotype, recurrently found in the Baltic Sea. NCD relative abundances were dominant in coastal stations, presumably driven by sediment resuspension as evidenced by higher turbidity and DOC levels and the recovery of sediment diazotrophs in the pelagic zone. This study reveals the heterogeneity of the composition and activity of diazotrophs in the Baltic Sea, and underscores the need for future N 2 fixation studies that include coastal and offshore Baltic waters.


Introduction
The Baltic Sea is among the largest brackish bodies of water on Earth, covering 370,000 km 2 .It spans a salinity gradient from 1 to 2 in the Bothnian Bay in the north to around 9 in the Arkona basin in the south, and is semi-enclosed with a mean depth of 56 m (Kullenberg and Jacobsen, 1981;Stal et al., 2003).In the Baltic Sea, dinitrogen (N 2 )-fixing (or "diazotrophic") cyanobacteria are important for N cycling (Larsson et al., 2001).N 2 fixation rates in late summer are so high that relative to its size the central Baltic Sea Proper is the body of water with the most N 2 fixation per unit of area in the world (Niemistö et al., 1989;Wasmund et al., 2001Wasmund et al., , 2005;;Klawonn et al., 2016).N 2 fixation can sustain up to 90% of the net summer community production with an estimated fixation of 180-430 Gg N yr − 1 (Larsson et al., 2001).
Nonetheless, this approach revealed the presence of the small symbiotic unicellular cyanobacteria UCYN-A (Candidatus Atelocyanobacterium thalassa; Bentzon-Tilia et al., 2015b;Reeder et al., 2022).This organism plays an important role in the oceanic nitrogen cycle (Martínez-Pérez et al., 2016;Zehr and Capone, 2020), but its distribution in the Baltic Sea is unknown.Additionally, nifH gene sequencing at specific sampling sites in the Baltic Sea revealed diverse assemblages of non-cyanobacterial diazotrophs (NCDs; Farnelid et al., 2009;2013).Cultivation of NCDs obtained from the Baltic Sea water column (Boström et al., 2007a;Farnelid et al., 2014;Bentzon-Tilia et al., 2014, 2015a) and associated N 2 fixation rate measurements (Bentzon-Tilia et al., 2015a) indicate that they may contribute to N 2 fixation in situ.Still, these insights are based on few localized studies.Hence, present knowledge about the diazotrophs in the Baltic Sea is mainly based on microscopy.A molecular analysis of the composition and biogeography of diazotrophs could, therefore, provide important new information about the present and active diazotrophs and their environmental regulation in the Baltic Sea.
Since bacterial and cyanobacterial biomass (Legrand et al., 2015) and N 2 fixation rates (Klawonn et al., 2016) differ between coastal and offshore stations, and coastal and offshore stations are kept separate in monitoring (Kownacka et al., 2020), it is essential to also include coastal environments when analyzing the overall composition of diazotrophs in the Baltic Sea.Shallow coastal and estuarine areas generally have a high suspended matter load and are rich in particulate organic matter (Simon et al., 2002), to a large extent originating from sediment resuspension (Turner and Millward, 2002).Particles have been proposed as loci for N 2 fixation by NCDs as they may represent high-substrate and low O 2 conditions (Paerl et al., 1988;Bombar et al., 2016;Chakraborty et al., 2021), which are considered essential for N 2 fixation by NCDs in the oxic water column (Riemann et al., 2022).Interestingly, a major part of the N 2 fixation in a Danish estuary influenced by the Baltic Sea was attributed to NCDs, and resuspension of sediment bacteria was considered a contributing factor (Bentzon-Tilia et al., 2015b).A later experiment in the estuary indicated that pelagic N 2 fixation could be enhanced almost 100-fold by sediment resuspension (Pedersen et al., 2018).Indeed, resuspension of sediments as a source of diazotrophs in the water column has been proposed for diverse coastal or estuarine regions; e.g.Spencer Gulf (Australia; Messer et al., 2021), Chesapeake Bay (USA; Short et al., 2004;Moisander et al., 2007), Monterey Bay (USA; Cabello et al., 2020) and Narragansett Bay (USA; Hallstrøm et al., 2022).Hence, it is likely that the combination of higher productivity (Sigman and Hain, 2012) and concentration of particles might facilitate particle-associated lifestyles in coastal areas.Thus, the composition of diazotrophs, and the relative importance of different size-fractions, is expected to differ between coastal and offshore regions.
In the present study, we aimed to obtain the first comprehensive overview of present and active diazotrophs in the Baltic Sea along the salinity gradient from south to north, including both coastal and offshore areas, as well as surface and deep samples.We analyzed the diazotroph community composition using nifH gene amplicons, characterized which taxa were actively transcribing the nifH gene, as a proxy for N 2 fixation activity, and compared this to a range of environmental parameters.To capture diazotrophs with potentially different diel patterns in nifH expression, sampling encompassed different time points, but also stations sampled at the same time of day for comparison.

Materials and methods
Samples were collected from 'surface' (average 1.8 m) and 'deep' (average 24 m, see Table S1 for max bottom depths) waters from 18 stations as part of the POS488 cruise on the R/V Poseidon from August 17th to September 4th, 2015.Stations were between 2 and 118 km from shore and maximum bottom depths were 15.5-365.8m (Tables S1 and   S2).The stations were chosen to sample the salinity gradient from the Arkona Basin in the south to the Bothnian Sea in the north and were operationally divided into coastal (<10 km offshore, European Environment Agency, 2006) and offshore (>10 km offshore) stations (Fig. 1).The two categories differed in their distance from shore (Wilcoxon rank sum test with continuity correction; Wilcoxon CC), p < 0.001) and the coastal stations had shallower maximum bottom depths (p = 0.0057, Wilcoxon rank sum exact; Wilcoxon ET).

Environmental parameters
Temperature, turbidity, salinity and O 2 were measured using a Sea-Bird SBE 9 CTD.Dissolved organic carbon (DOC) and chlorophyll a (Chl a) samples were filtered on board, frozen and analyzed according to standard protocols at Leibniz Institute for Baltic Sea Research Warnemünde (IOC, 1994;Grasshoff et al., 1999;Lysiak-Pastuszak and Andersens, 2004).Inorganic nutrients (PO 4 were measured by standard colorimetric methods (Grasshoff et al., 1999).

Bacterial enumeration
Samples for flow cytometry were fixed in triplicate with paraformaldehyde and glutaraldehyde (final concentrations 1% and 0.05%, respectively; pH = 7.4) and incubated at 4 • C for 30 min.Thereafter, each triplicate was stained with SYBR™ Green I Nucleic Acid Stain (Thermo Fisher Scientific, USA) at room temperature for 15 min following Marie et al. (1999).Cell counts were obtained via on-board flow cytometry (BD Accuri™ C6 cytometer, BD Biosciences, USA) at 'slow speed' mode (14 μL min − 1 , 10-μm core).Low nucleic acid content (LNA) and high nucleic acid content (HNA) cells were discriminated using BD Accuri C6 software and final counts were averaged from triplicate subsamples.

Sampling and extraction of DNA and RNA
Size-fractionated DNA and RNA samples were obtained from surface and deep samples from all 18 stations.DNA samples were collected using Niskin bottles while RNA samples were collected and immediately fixed using an autonomous in situ fixation sampler (Charvet et al., 2019).
DNA was extracted from half-filters from each sample using the DNeasy® PowerSoil® Kit (Qiagen) with the following modifications: Filters were freeze-thawed for three cycles, and then cut aseptically in small pieces into bead beater tubes prior to extraction.DNA was quantified using Quant-iT™ PicoGreen™ dsDNA Assay Kit (Invitrogen, Oregon, USA).RNA was extracted using the RNeasy Mini kit (Qiagen) with modifications including an on-column DNA digestion (RNase-Free DNase Set, Qiagen), initial bead beating (200 μm Low Binding Zirconium Beads, OPS Diagnostics, LLC), an additional DNA digestion (Invitrogen, TM TURBO DNA-free™ Kit), and additional clean-up using the RNA clean and Concentrator™ -5 (Zymo Research, USA).RiboLock RNase Inhibitor (Thermo Scientific, Vilnius, Lithuania) was added to the final extract.
RNA was quantified (Quant-iT™ RiboGreen™ RNA Assay kit, Invitrogen) and stored at − 80 • C or immediately reverse transcribed to complimentary DNA (cDNA) using SuperScript™ IV (Invitrogen) and the nifH3 primer (Zani et al., 2000) according to the manufacturer's protocol.Reverse transcriptase-free controls were run for each RNA sample by substituting the reverse transcriptase with PCR grade water.
The cDNA product was stored at − 20 • C.
The thermocycling conditions for both reactions were 2 min at 94 • C, followed by 30 cycles of: 1 min at 94 • C, 1 min at 54 • C and 1 min at 72 • C ending with 7 min at 72 • C. Triplicate PCR products were pooled and purified using the Geneclean® Turbo Kit (MP Biomedicals, LLC).A negative PCR control with PCR grade water instead of template was included in all runs and showed no amplification.The reverse transcriptase-free controls were checked for complete DNA digestion by PCR as described above and showed no amplification.The PCR products were indexed using custom designed Illumina indexes with an 8 cycle PCR (20 s at 94 • C, 30 s at 62 • C and 30 s at 72 • C).Finally, samples were purified using magnetic beads (Agencourt, Beckman Coulter, Ca, USA), pooled in equimolar ratios and sequenced using an Illumina MiSeq Platform (GeoGenetics Sequencing Core, Denmark).

Processing of amplicon sequences
Of the 144 total samples, 122 (55 DNA and 67 RNA) were successfully amplified and sequenced for nifH.Most of the samples that failed to amplify, had very low concentrations of DNA/RNA, possibly due to low extraction efficiency.Reads were processed using DADA2 (version 1.20.0;Callahan et al., 2016) and R (version 4.1.3;R Core Team, 2021).Quality profiles for all paired FASTQs were prepared using FastQC (version 0.11.9;Andrews, 2010).Then primers were trimmed using cutadapt (version 3.4; Martin, 2011) with tolerance for 1 mismatch (-m 1) and no indels (-no-indels).Reads from both direction were required to have primers or the pair was rejected (-discard-untrimmed).Each combination of sample type (DNA or RNA) and size fraction was processed separately with DADA2 and R using mainly standard steps: filter and trim, denoise, and merge the reads into amplicon sequence variants (ASVs).An exception was that the sequencing error models used by DADA2 for the function filterAndTrim() were created in advance (with the function learnErrors()) using only the paired reads that appeared to represent nifH amplicons to remove off-target PCR artefacts.These were identified by predicting open reading frames (ORFs) on the forward reads with FragGeneScan (version 1.31; Rho et al., 2010) and then scanning the ORFs for the NifH/frxC family (Fer4_NifH; PF00142) using hmmsearch in HMMER software (version 3.3.2;Finn et al., 2011).Only the paired reads with ORFs that had ≥80 residues and met the trusted cut-offs encoded in PF00142 were used to create the error models that were passed to the filterAndTrim() function.Additional parameters to the filterAndTrim() function specified trim lengths of 220 nt and 180 nt for the forward and reverse reads, respectively, selected based on the FastQC quality reports, and maximum tolerated base call errors (maxEE) of 4 nt and 6 nt, respectively.The parameters of the DADA2 function mergePairs() required sequences to overlap by > 12 nt with at most 1 mismatch.Chimera ASVs were detected and removed first using the DADA2 function isBimeraDenovo() with minFoldParentOverAbundance = 3.5, and then using UCHIME2 (Edgar, 2016) as implemented in vsearch (version 2.18.0;Rognes et al., 2016) through option -uchi-me3_denovo.Non-nifH ASVs were identified and removed using a similar approach to that used for creating the error models.Finally, ASVs that did not meet the criteria of having >10 reads in ≥3 samples or >40 reads in one sample were removed.
The total number of reads after quality control was 15,206,287, yielding 1139-244,590 reads per sample (mean; 129,968 reads).For the downstream analysis, 2656 ASV and 117 samples were analyzed due to removal of samples with less than 1000 reads.Sequence data were not rarefied before analysis as this may lead to loss of information (McMurdie and Holmes, 2014) and most samples exceeded the depth considered acceptable for comparing microbiome composition between samples (e.g. in gut microbiomes; 10,000 reads per sample; Falony et al., 2016).Of the 2656 total ASVs, 866 were represented in the RNA and 1843 in the DNA.BLASTX against a curated database containing phylogenetically characterized nifH sequences (genome879; jzehrlab.com/nifh) was used to categorize the nifH sequences into NCD and cyanobacterial nifH.The ASV data was used to create relative abundances for the respective ASVs in each sample.Relative abundances, as used in the present work, are often used in marine microbiology (e.g.Herlemann et al., 2011;Farnelid et al., 2019), but should be interpreted cautiously as they are compositional in nature and not strictly quantitative.

Clustering of the main cyanobacterial ASVs
To gain insight into the composition of cyanobacterial diazotrophs, the 36 most dominant cyanobacterial ASVs, accounting for 90% of the cyanobacterial relative abundances, were translated into 15 unique amino acid sequences and clustered into seven groups using CD-hit at 97% similarity (Li and Godzik, 2006;Huang et al., 2010;Fu et al., 2012).Reference sequences were selected among known Baltic cyanobacteria (Stal et al., 2003) and nearest relatives using BLASTP (National Center for Biotechnology Information, NCBI).The RefSeq non-redundant proteins with the best match were chosen, then further culled to one reference sequence per cluster.Two reference sequences from isolated strains were chosen as they were specific to the Baltic Sea: Aphanizomenon KAC15 and Nodularia spumigena AV1 (Janson and Granéli, 2002;Vintila et al., 2011).After ASVs were assigned a cluster, the read count for each of the seven clusters were summed for each sample.The groups were named after their reference sequence.The remaining 10% of cyanobacterial relative abundances were represented by 1052 ASVs that were grouped and labeled 'Other cyanobacteria'.

Statistics
Statistics were performed with R-4.1.1 in R studio (R Core Team, 2021;RStudio Team, 2021).The reads for each ASV were transformed into relative abundances per sample with the "microbiome" package (Lahti and Shetty, 2019).The function "knnImputation" from the "DMwR2" package (Torgo, 2016) was used to impute a missing Chl a value (a lost filter).
To check for significant differences in environmental parameters between coastal and offshore stations, Wilcoxon rank tests were applied using the "stats" package.Wilcoxon signed-rank tests were furthermore applied to compare size-fractions.The same package was used for performing T-tests on log-transformed bacterial abundances to check for significant differences between the averages for coastal and offshore stations.
Linear Mixed-Effects Models (LME) and Generalized Linear Mixed-Effects Models (GLMM) were made with the "lme4" package (Bates et al., 2015) to evaluate how much effect certain environmental parameters had on the variation of the relative abundances of cyanobacterial CD-hit clusters and NCD classes.The kind of transformation needed to make the data follow a normal distribution was based on Box Cox transformation from the package "MASS" (Venables and Ripley, 2002).To achieve normal distribution both log and square root transformations were applied, according to the Box Cox lambda value.Deltaproteobacteria and Pseudanabaena-like nifH gene abundance were log transformed and NCD nifH genes were square root transformed.Each model was validated with residual plots and Quantile-Quantile plots.Dolichospermum-like, Aphanizomenon-like and Nodularia AV1-like nifH were transformed into presence/absence since they did not meet the validation criteria.GLMM was used on presence/absence data as an alternative to LME.Fixed effects included: Size-fraction × depth, offshore/coastal and salinity.'Station' was set as a Random effect.The influence of salinity on the variance of total nifH composition was examined by redundancy analysis (RDA) on centered log ratio transformed nifH classes for surface and deep samples separately, with salinity and temperature as the constraint variables, using "phyloseq", and "microViz" (McMurdie and Holmes, 2013;Wickham, 2016;Barnett et al., 2021).

Environmental conditions
Environmental data for all stations are summarized in Tables S1 and  S2.Salinity ranged from 4.3 to 8.6 with samples north of station MP12 being <7 (except S7 at 46.1 m) while samples south of MP12 were >7 (Fig. 1).Temperatures at the surface and deep sample points ranged 11.7-20.7 • C (Fig. S1a) and 4.1-17.5 • C, respectively.All stations were oxic with O 2 concentrations between 174.9 and 375.5 μmol L − 1 .Chl a varied from 1.6 to 9.3 μg L − 1 at the surface and 0.7-1.5 μg L − 1 for the deeper samples (Fig. S1b).Turbidity at the surface was 0.19-1.5 Nephelometric Turbidity Units (NTU) and 0.1-1.8NTU at depth (Fig. S1c).DOC was 320.6-514.3μmol L − 1 at the surface (data from deep samples not available).Total dissolved inorganic nitrogen (DIN) was low at the surface (<0.5 μmol L − 1 ) and higher at depth (0.1-7.5 μmol L − 1 ).Dissolved inorganic phosphorus (DIP) was 0-0.8 μmol L − 1 at the surface and 0.1-1.2μmol L − 1 at depth.At both depths at nearly all stations (except for stations Mo10 and S10 in the Bothnian Sea), the DIN/DIP ratios were less than the Redfield ratio of 16:1 (Fig. S1d; Redfield et al., 1963).The lowest surface temperatures were at the most southern stations (MP6 -8; Fig. S1a), where also surface DIP (0.4, 0.6 and 0.6 μmol L − 1 ) was higher than average, indicating a local upwelling event.
Salinity, temperature, Chl a, DIP and DIN/DIP ratios did not differ Salamon Slater et al. between coastal and offshore stations (Wilcoxon); however, deep O 2 concentrations were lower at coastal stations (p < 0.001; Wilcoxon CC).Furthermore, coastal samples had higher turbidity, both at the surface and at depth (p = 0.003 and p = 0.001; Wilcoxon ET).Additionally, DOC was highest in the coastal surface waters (p = 0.005; Wilcoxon CC) and DIN was highest in the coastal deep (p = 0.009; Wilcoxon CC).

Bacterial abundances
Bacterial abundance ranged 1.45 x 10 6 -5.6 x 10 6 cells ml − 1 and 3.8 x 10 5 -3.7 x 10 6 cells ml − 1 , for surface and deep samples, respectively (Tables S1 and S2).When excluding stations MP6 -8 that showed signs of upwelling (see above), bacterial abundance in surface waters was higher for the coastal stations than offshore (p < 0.001; T-test).The upwelling stations (MP6 -8) showed the lowest abundances at the surface and the lowest ratio between HNA and LNA of all stations (including station S7 at 46 m).
Aphanizomenon-like, Nodularia AV1-like and Dolichospermum-like nifH, accounted for >78% of expression in all sample categories, with Aphanizomenon and Nodularia dominating (Fig. 2b).These were identical to nifH transcripts from a previous Baltic Sea study (Boström et al., 2007b).Pseudanabaena-like nifH accounted for only ca. 5% of the transcripts in the FL size-fractions, and <0.1% in the PA size-fractions (Fig. 2b).UCYN-A2 contributed overall with only ca.1% of the expression (Fig. 2b).UCYN-A2 showed high expression at station Mo8, where it accounted for 19% and 25% of the transcripts in the surface and deep FL size-fraction, respectively.UCYN-A2 was also found in the PA-size fraction, but to a smaller extent (<1%).In addition, at station MP9 UCYN-A2 was responsible for <0.1% of total nifH transcripts.
Deltaproteobacteria are known affiliates of nifH Cluster III, containing mostly anaerobic bacteria (Zehr et al., 2003), which could therefore show preference for the PA size-fraction.Among the 32 dominant NCD ASVs, only three Cluster III ASVs were found in a single size-fraction.A Desulfovibrio (ASV 3530) and a Desulfomicrobium (ASV 3531) were associated with the PA size-fraction while another Desulfovibrio (ASV 1449) was only associated with the FL size-fraction.
Highest nifH transcript relative abundances by NCDs occurred in the deep samples (Fig. 2b), where the relative abundance of NCD nifH genes also was the highest (Fig. 2a).NCD expression accounted for >1% of the relative nifH gene expression in only the coastal stations MP6, MP7, MP19 and the offshore station TF0142 (Fig. 1).None of these samples  were from the surface PA size-fraction.More than half of the NCD nifH gene expression in each sample was represented by Gammaproteobacteria, Deltaproteobacteria and Clostridia, with Gammaprotebacteria as especially prominent (section 3.6).

Environmental influence on nifH gene composition
We used Linear Mixed-Effects Models (LME; relative abundance data) and Generalized Linear Mixed Models (GLMM: absence/presence data) to examine the affiliation of diazotrophs with the type of environment (Tables S3 and S4).Offshore areas had a positive effect on cyanobacterial nifH relative abundances (p < 0.001; LME) and a negative effect on the presence of Dolichospermum-like nifH (p = 0.043; GLMM).Area (offshore/coastal), salinity or size-fraction × depth interaction did not have significant effects on the variation in Pseudanabaena-like nifH relative abundances (LME).For Nodularia AV1-like and Aphanizomenon-like nifH presence/absence data, the interaction between size-fraction and depth had a significant effect on the variance with the FL size-fraction and 'surface' having a combined negative effect (p = 0.043 and p = 0.026, respectively; GLMM).
The relative abundances of NCDs were negatively affected by offshore environments (p < 0.001; LME).Similarly, the relative abundance of Deltaproteobacteria were negatively affected (p = 0.001; LME).Salinity explained only ca.<8% of the variance in composition of nifH classes, at the surface and at depth (Fig. S6, RDA).
We interpreted the higher turbidity and DOC levels at the coastal stations (see above), as indication of sediment resuspension, and examined differences in diazotroph composition between coastal and offshore waters to investigate whether distinct diazotroph communities could be detected.Deltaproteobacteria showed ca. 4 times higher relative abundance in coastal surface samples (18.4%) than offshore (4.6%; Fig. S5), and the benthic cyanobacterium Leptolyngbya (Sivonen et al., 2007), seemed also to have higher relative abundances at the coastal surface stations (1.4%) compared to offshore (0.08%; Fig. S7).Furthermore, 118 NCD ASVs from our study were identical to ASVs previously found in coastal Baltic Sea sediments (Liesirova et al., 2023).The NCD ASVs accounted for 1.7% and 15.6% of the average relative abundances at offshore and coastal surface samples, respectively.This indicated on average 11 times higher relative abundance in the coastal surface samples compared to offshore.

Expression of diazotrophs represented by 5 stations
The expression of nifH genes shows species-specific and diurnal variation (e.g.Evans et al., 2000;Church et al., 2005;Boström et al., 2007a).Therefore, to compare nifH gene expression across the Baltic Sea, stations sampled at roughly the same time of day were selected (afternoon; 15:45 ± 2.15 h; Fig. 4).The samples at each station were pooled to get an overall picture of the expression at the station.Unfortunately, the dataset was too small to statistically establish putative linkages to environmental conditions, however, there was a trend of increased relative abundance of Aphanizomenon-like nifH expression towards the north of the Baltic Sea (Fig. 4a).More than half of this expression was represented by ASV 1, which is identical to a sequence previously recovered from the Baltic (EU916460; Farnelid et al., 2009).Expression by Nodularia AV1 appeared higher towards the south.The highest NCD expression (14.9% of the transcripts relative abundances) was found at station MP6 (Fig. 4a), and largely accounted (>90%) for by the gammaproteobacterial Pseudomonas-like ASV 325, which was identical to nifH gene sequences recovered from Baltic surface water (Farnelid et al., 2014) and sediment (Liesirova et al., 2023).This ASV also showed 99.7% nucleotide similarity to a diazotroph (clade GD11.0) cultivated from the central Baltic Sea (Bentzon-Tilia et al., 2014).Station MP7 was not sampled in the afternoon, but was the only other station than MP6, which showed substantial NCD nifH expression (4.8% of transcript relative abundances).This station had the highest expression by Deltaproteobacteria of any station sampled (1.9% of transcript relative abundances).Deltaproteobacteria accounted for 0.1% of expression on average across all samples.

Discussion
Our study yields insights of critical importance for the understanding of N 2 fixation in the Baltic Sea.In particular, the finding that Pseudanabaena and NCDs dominate the nifH gene pool contrasts with the prevailing view of filamentous heterocyst-forming cyanobacteria as the only relevant N 2 -fixers (e.g.Klawonn et al., 2016).Despite dominating the gene pool, these organisms showed highly variable nifH expression illustrating a need for clarifying their significance for local N cycling.The sampling of offshore and coastal environments revealed a striking difference in nifH gene composition and expression, conceivably linked to the high particle load of coastal waters, influenced by sediment resuspension.

Pseudanabaena-like ASVs dominated the overall composition of cyanobacterial nifH
The dominance of Pseudanabaena was surprising, as most studies using microscopy to investigate cyanobacteria report dominance by Aphanizomenon sp., sometimes followed by Nodularia sp.(e.g., Berner et al., 2018;Eigemann et al., 2019;Klawonn et al., 2016;Kownacka et al., 2020).Pseudanabaena spp.has, however, been reported as a dominant cyanobacterium during summer when counted by microscopy (Riemann et al., 2008), but is reported in other studies as a rather minor constituent of phytoplankton (Eigemann et al., 2019) or cyanobacterial biomass (Berner et al., 2018).Pseudanabaena-like nifH gene sequences have been recovered in Baltic Sea nifH gene libraries (Stal et al., 2003;Acinas et al., 2009;Farnelid et al., 2009Farnelid et al., , 2013)), and accounted for 8% of the relative nifH gene abundance in the southern Baltic Sea (Reeder et al., 2022).Our finding of Pseudanabaena occasionally accounting for more than half of the cyanobacterial nifH genes is, nevertheless, striking.The year of our study, 2015, was the second warmest year in the Baltic Sea since 1990 (Siegel and Gerth, 2016).Since warm temperature increases Pseudanabaena spp.biomass (Berner et al., 2018), we speculate that high temperature caused the exceptionally high prevalence of Pseudanabaena-like nifH.Importantly, the recurrent findings of Pseudanabaena spp. in Baltic Sea plankton highlights a need for mapping its importance for N 2 fixation, in space and over time.In particular, considering the future elevated temperatures predicted for the Baltic Sea (Meier, 2006).

Resuspension impacts the coastal diazotrophic community
Consistent with an earlier study suggesting complex environmental regulation of diazotrophs in the Baltic Sea (Farnelid et al., 2009), no clear effect of discrete environmental parameters on the nifH gene composition was found.The negligible impact of salinity on composition was particularly surprising, given its reported effects on nifH gene distribution (Reeder et al., 2022), cyanobacterial community composition (Dupont et al., 2014;Olofsson et al., 2020), and bacterial composition in the Baltic Sea (Herlemann et al., 2011;Dupont et al., 2014).We, therefore, speculate that the salinity gradient sampled (4.3-8.6) is not a strong driver of diazotroph community structure, like it has been concluded for general bacterioplankton community structure (Herlemann et al., 2011).
We hypothesized that the anticipated higher productivity and load of particles in coastal areas would cause differences in composition of diazotrophs between coastal and offshore areas, and between sizefractions in the plankton.Our sampling confirmed that the coastal stations differed from offshore stations by having higher DOC content at the surface and higher turbidity at both sampled depths.Turbidity can be considered a proxy for suspended sediment in the water column (Davies-Colley and Smith, 2001), and the elevated levels of turbidity and DOC may, therefore, suggest sediment resuspension and/or terrestrial inputs at the coastal stations.Such input may promote diazotrophic diversity and N 2 fixation in estuarine and coastal waters (Mulholland et al., 2012;Bentzon-Tilia et al., 2015b;Pedersen et al., 2018;Liesirova et al., 2023), and diazotrophs, particularly putative anaerobic NCDs, likely of sediment origin have previously been observed in estuarine waters (Moisander et al., 2007;Cabello et al., 2020;Hallstrøm et al., 2022).Indeed, we found higher relative abundances of NCDs at the coastal stations, whereas highest relative abundance of cyanobacterial nifH was found offshore.Many diazotrophs previously found in sediment samples of coastal origin in the Baltic Sea were found at our coastal stations.For instance, nifH genes affiliated with Leptolyngbya, a cyanobacterium known to be associated with Baltic Sea sediments and mats (Sivonen et al., 2007;Włodarska-Kowalczuk et al., 2014), were found at most of the coastal stations, accounting for on average 2.4% of the nifH reads from the PA surface size-fraction.In contrast, Leptolyngbya accounted for <0.1% of the nifH reads from all sample categories from the offshore stations, and was detected at <40% of the offshore stations.We also recovered 118 NCD ASVs identical to nifH phylotypes from coastal Baltic sediments (Liesirova et al., 2023).These ASVs showed on average 11 times higher relative abundance at the coast compared to offshore.These observations point to sediment resuspension as a strong driver of diazotroph composition at the coastal stations.
Furthermore, the coastal stations in our study showed a roughly 4fold higher relative abundance of Deltaproteobacteria than the offshore stations.They have previously been found in Baltic sediments (Bertics et al., 2013;Liesirova et al., 2023) and belong mainly to the nifH Cluster III, which contains primarily strict anaerobes (Zehr et al., 2003).This makes their presence in the water-column puzzling and has led to discussions about their possible association with pelagic particles (Braun et al., 1999;Church et al., 2005;Debeljak et al., 2021).In our study there was, however, no significant difference in the prevalence of Deltaproteobacteria in the free-living and particulate size-fractions, and many ASVs were found associated with both size-fractions.This highlights the need for investigating adaptations by these putative anaerobic diazotrophs, pertaining to their survival in the oxygenated water-column (Farnelid et al., 2013), but also to the fact that their status as strict anaerobes potentially needs re-evaluation (Man-Aharonovich et al., 2007).
Our present survey of nifH genes reveals large differences in the composition of putative diazotrophs between coastal and offshore stations in the Baltic Sea.This difference conceivably pertains partially to coastal upwelling and sediment resuspension, and preponderance of groups of diazotrophs (e.g.Pseudanabaena spp.and NCDs) for which the ecology is poorly understood.Strikingly, the composition of the putative diazotrophs (DNA) were only to some extent mirrored in the composition of the active nifH expressing taxa (RNA; see below).

Heterocyst-forming cyanobacteria account for a majority of nifH expression
The heterocyst-forming Aphanizomenon-like, Nodularia AV1-like and Dolichospermum-like ASVs accounted for almost 80% of our nifH transcripts across all stations.This may even be an underestimate since nifH gene expression in Nodularia has been found to peak at noon (Boström et al., 2007b), at which time only few stations were sampled.They were found in both size-fractions, as observed earlier (Wasmund et al., 2001;Farnelid et al., 2009), likely due to variable cell sizes (Wojtasiewicz and Stoń-Egiert, 2016) or break-up of filaments during filtration.The prominence of these taxa among nifH gene transcripts is consistent with earlier studies (Boström et al., 2007b;Klawonn et al., 2016) and likely also reflect a main contribution to overall N 2 fixation (Klawonn et al., 2016).
While Pseudanabaena spp.dominated the cyanobacterial nifH genes, it accounted for a relatively minor proportion of the nifH transcripts.The reason for this apparent discrepancy is unknown.The nifH gene expression by Pseudanabaena falls in line with earlier data from the Baltic Sea Proper (Farnelid et al., 2013) and documented low cell-specific N 2 fixation by Pseudanabaena from the same station and year/month, as sampled by us (station S7; Eigemann et al., 2019).This apparently contrasts with labeling it as a non-N 2 -fixer (Berner et al., 2018) and an earlier study from the station (station S7) where N 2 fixation by Pseudanabaena spp. was not measurable (Klawonn et al., 2016).However, these observations may originate from the fact that only some strains of Baltic Pseudanabaena have nifH and thereby the ability to carry out N 2 fixation (Acinas et al., 2009).Interestingly, the cell-specific N 2 fixation by Pseudanabaena was measured during a late-stage cyanobacterial bloom (Eigemann et al., 2019), hinting at Pseudanabaena spp. as an opportunistic N 2 -fixer, becoming active only under specific conditions.Such strategy could potentially explain our observation of Pseudanabaena spp.dominating the nifH gene pool, while accounting for only a minor fraction of the nifH gene transcripts.Moreover, it underlines the need to unravel spatio-temporal dynamics of N 2 fixation in Pseudanabaena spp.for determining its importance for nitrogen cycling in the Baltic Sea.
The UCYN-A/haptophyte symbiosis has been recovered from coastal areas worldwide (e.g.Short and Zehr, 2007;Mulholland et al., 2019;Messer et al., 2015;Shiozaki et al., 2015;Cabello et al., 2020;Hallstrøm et al., 2022).The sublineage UCYN-A2 is thought to have a coastal association (Turk-Kubo et al., 2017, 2021) and has previously been detected in the southern Baltic Sea (Reeder et al., 2022) and connected areas west of the Baltic Sea (Bentzon-Tilia et al., 2015b;Scavotto et al., 2015).Consistent with these finding, we only found high UCYN-A2 nifH expression (19-25% of the transcripts) at the relatively saline (7.93) and westernmost station (Mo8).We, therefore, speculate that UCYN-A2 makes a hitherto unaccounted for contribution to N 2 fixation in the westernmost part of the Baltic Sea but that it does not thrive in the low salinities of the central and northern parts.Deltaproteobacteria was the dominant class of NCDs (on average 20.6% of relative abundance) but accounted on average for only 0.1% of the nifH gene transcripts.However, the only NCD to show substantial nifH expression was Pseudomonas-like ASV 325.Pseudomonas accounted on average for 5% of the overall nifH genes, and was found at all stations at varying depth and size-fractions, except at the most western station (Mo8).Almost identical (99.7%) sequences have been found in other waters (Turk et al., 2011;JN097381.1 GenBank).Identical or similar (99.7%) sequences have been recovered from the Baltic Sea (Farnelid et al., 2013(Farnelid et al., , 2014;;Bentzon-Tilia et al., 2014).In addition, a Baltic Pseudomonas stutzeri strain (BAL361, identical to ASV 325) can sustain growth by N 2 fixation (Bentzon-Tilia et al., 2015a).The Pseudomonas nifH gene expression coincided with an upwelling event (as reported elsewhere; Siegel and Gerth, 2016), in our dataset suggested by elevated DIP, lowest surface temperature, lowest surface bacterial abundances and at depth the smallest HNA to LNA ratio.Indeed, cool and nutrient rich conditions have been suggested to select for gammaproteobacterial NCDs in the Baltic (Farnelid et al., 2009) and may, therefore, have stimulated the high Pseudomonas nifH gene expression at these specific coastal sites.Since upwelling is a reoccurring phenomenon along the southern coast of the Baltic Sea (Lehmann and Myrberg, 2008;Zhang et al., 2022), we speculate that Pseudomonas regularly contributes to N 2 fixation in this region.
To gain insight into patterns of nifH expression across the Baltic Sea, we compared five stations sampled at the same time of day (afternoon).Aphanizomenon accounted for between 3 and 88% of the transcripts and showed increased relative expression at lower salinity, consistent with its known preference for low salinity (Lehtimäki et al., 1997) and prevalence in the low saline northern Baltic Sea (Stal et al., 2003;Kownacka et al., 2020).In contrast, Nodularia is known to be more common in the south and central Baltic Sea (Kownacka et al., 2020), which corresponded to Nodularia (ASV1) showing higher relative expression in the south.These data should, however, be interpreted with caution as diurnal patterns in nifH expression may differ between taxa.

Concluding remarks
Our survey of nitrogenase genes and their expression revealed presence of several important groups, beyond the filamentous heterocyst-forming taxa, that are well-documented by microscopy.NCDs accounted for 43% of the diazotroph community composition, but only 1% of the total nifH gene transcripts.Furthermore, Pseudanabaena was the dominating cyanobacteria, but was not among the main nifH gene expressing taxa.This highlights that composition and expression of nifH can be remarkably different, something that has been repeatedly observed (Short and Zehr, 2007;Bentzon-Tilia et al., 2015a;Yang et al., 2019;Hallstrøm et al., 2022).This discrepancy, as well as the differences in diazotroph composition between the offshore and coastal environments, emphasize the importance of including both composition and expression, as well as distinguishing between coastal and offshore environments, when monitoring diazotrophs and their activity in the Baltic Sea.

Fig. 1 .
Fig. 1.Map of the Baltic Sea depicting surface salinity at the 18 sampled stations.The white, dashed line represents the division of salinities >7 and <7.Red squares and blue triangles define coastal (<10 km from shore) and offshore (>10 km from shore) stations, respectively.(For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Fig. 3 .
Fig. 3. Relative abundance of non-cyanobacterial diazotrophs (NCD) in all samples.Surface' (avg.1.8 m) and 'Deep' (avg.24 m), for the free-living (0.2-3.0 μm) and the particle associated (>3 μm) size-fractions.Stations Mo8 -Mo10 are offshore (>10 km from shore, blue triangles) and stations MP6 -MP25 are coastal (<10 km from shore, red squares).Offshore (blue triangles) and costal (red squares) stations are ordered from south west going north and down to south east (see Fig. 1).Stations with no available data (n/a).(For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Fig. 4 .
Fig. 4. Composition of present and active diazotrophs from stations sampled at the same time of day (afternoon).Average relative abundance for nifH gene expression (a) and nifH genes (b; for reference).nifH genes from surface, deep, and both size-fractions were pooled for each station.Note that at station Mo9 composition is only represented by the surface particle-associated size-fraction.