Large-scale differences in diversity and functional adaptations of prokaryotic communities from conserved and anthropogenically impacted mangrove sediments in a tropical estuary

Mangroves are tropical ecosystems with strategic importance for climate change mitigation on local and global scales. They are also under considerable threat due to fragmentation degradation and urbanization. However, a complete understanding of how anthropogenic actions can affect microbial biodiversity and functional adaptations is still lacking. In this study, we carried out 16S rRNA gene sequencing analysis using sediment samples from two distinct mangrove areas located within the Serinhaém Estuary, Brazil. The first sampling area was located around the urban area of Ituberá, impacted by domestic sewage and urban runoff, while the second was an environmentally conserved site. Our results show significant changes in the structure of the communities between impacted and conserved sites. Biodiversity, along with functional potentials for the cycling of carbon, nitrogen, phosphorus and sulfur, were significantly increased in the urban area. We found that the environmental factors of organic matter, temperature and copper were significantly correlated with the observed shifts in the communities. Contributions of specific taxa to the functional potentials were negatively correlated with biodiversity, such that fewer numbers of taxa in the conserved area contributed to the majority of the metabolic potential. The results suggest that the contamination by urban runoff may have generated a different environment that led to the extinction of some taxa observed at the conserved site. In their place we found that the impacted site is enriched in prokaryotic families that are known human and animal pathogens, a clear negative effect of the urbanization process.


Read-pairing and de-noising
For conserved site reads we used a tandem pairing process as QIIME2 uses an implementation of vsearch for read pair joining ('vsearch join-pairs') which has a lower criteria of a minimum of 10 base-pairs of overlap. To extend vsearch with accurate read-pair joining below this cut-off we used a custom script (jantar.py) that would attempt to combine read-pairs that had failed to combined with vsearch. In order to be combined a read pair must have a "perfect" (starting from tails of reads, with no mismatches, in the correct orientation) 7bp overlap or 10bp with a single mismatch. Reads were denoised using 'dada2 denoise-single', default settings using '--p-trimleft 3 and --p-trunc-len 0'. DADA2 as implemented by QIIME2 was used for read-pairing and denoising of the Impacted site data.

QIIME2
Phylogeny was determined using 'phylogeny align-to-tree-mafft-fasttree' with default settings. Minimum (48638) and maximum (106724) read depths were determined using 'table_dada2.qzv'. Alpha-rarefaction was then calculated using the rooted MAFFT tree and the maximum read depth.
Alpha and Beta diversities were calculated using 'diversity core-metrics-phylogenetic' using the minimum read depth ('--p-sampling-depth') and the rooted MAFFT tree. Alpha-diversity was evaluated using the Kruskal-Wallis test, Beta-diversity evaluated using PERMANOVA. Rarefied data did not violate Kruskal-Wallis homoscedasticity assumption using Levene's test, and betadisper as implemented in the vegan R package was used to test for PERMANNOVA's multivariate homoscedasticity assumption.
Finally, gneiss was used to calculate the hierarchical correlation-clustering of unnormalized taxa counts constrained to the levels of Family and greater using 'gneiss correlation-clustering'. A pseudo-count of 1 was used for handling log-transforms. This was plotted using 'dendrogramheatmap' for the meta-data column 'Site'.

PICRUSt2
PICRUSt2 was run twice using '--stratified' and '--per_sequence_contrib' options. One run was set to limit NSTI thresholds to 0.15, while the other (and data set subsequently used) was ran with the default NSTI threshold of 2.0.

Taxonomic abundance analysis
Novembro.py normalizes the ASV read abundances generated by QIIME2 into their corresponding taxa. This requires feature-table.tsv and taxonomy.tsv files. While the taxonomy.tsv file can be found by unzipping the taxonomy.qza file the feature-table.tsv needs to be generated using a biom conversion command (eg. 'biom convert -i feature-table.biom -o feature-table.biom.txt --to-tsv') Novembro.py iterates from the lowest to the highest taxonomic level combining ASV read abundances into their corresponding taxa. Replicate normalizations are downsampled to match the lowest ASV abundant replicate. We use novembro.py to calculate zone specific taxonomic enrichment. This is accomplished by first using chi2 to compare the replicates of each zone against the others. To be deemed significant a zone must have an greater than a 5% effect size and p-value less than, or equal to, 0.05 relative to both other sites.
Taxa that are found to be significantly enriched are saved to a tab-delimited file along with their log10 transformed abundances.

Potential functional abundance analysis
Sigilo.py performs several functions to aid in the visualization and analysis of PICRUSt2 and ALDEx2 generated KO data.
Generate heatmap: KO enrichment heatmaps can be generated using '--generate_heatmap', this function combines the predicted functional abundances of KOs identified by ALDEx2 as significant into their corresponding KEGG Ortholog pathways, performs a log10 transform and plots them as heatmaps.
Correlate ASV with Functional Abundance: Using the '--asv2fa' command we first combine all ASVs into their corresponding taxa at a given level (for Family, level = 4) and then calculate the total functional abundance for those combined ASVs to derive the predicted functional abundance of a given taxa.
Correlate ASV with NSTI: Using the '--asv2nsti' command we first combine all ASVs into their corresponding taxa at a given level (for Family, level = 4) and then combine the associated NSTI values for those ASVs for each taxa. We can then use these to identify the mean, median, and standard deviation of NSTI scores for each given taxa. Here we show the mean measures of environmental variables associated with the conserved (C1, C2, C3) and impacted (I1, I2, I3) sites used in this study as well as the ratio of impacted over conserved (I/C). These were tested using Krusakl-Wallis (df=1, H-statistic and p-value).  Here we show the results of PCoA constrained ordination for OTUs for the environmental variables from the two sites.

Figure 1. Sediment sampling strategy
Here we show the sampling schematic at the conserved site. Originally sampling was performed in triplicate at three different tidal zones (A, B) [1]. For this study we recombined samples such that each conserved site replicate had one sample from each tidal zone (C).

Figure 2. Schematics representing straight-line distances between sediment sampling sites and their respective heavy metal sampling sites.
Here we show the straight-line distances between the sediment sampling sites and their respective heavy metal sampling sites [2]. The average distance for the conserved set is 6.98 km (A) and 1.64 km for the Impacted site (B). In both cases the heavy metal sampling sites were selected based on being the closest sites downstream of the sediment sampling sites.

Figure 3. Taxa overlap between sites.
Here we show the overlap of taxa constrained to Species level and higher (A.) and Family level and higher (B.) Taxa were required to have a minimum of 20 unnormalized reads in at least one site to be included.

Figure 4. Alpha-rarefaction plot as calculated using QIIME2
Rarefaction curves for Impacted and Conserved sites. For rarefaction we selected Conserved 1 with 48,600 reads.

Figure 5. Additional alpha-diversity tests
In addition to Shannon's diversity, both Pielou (A) and Faith (B) were calculated using rarefied data and found to be significant using pairwise Kruskal-Wallis.

Figure 6. Additional beta-diversity tests
Distance measures for Jaccard (A), Unweighted UniFrac (B), and Weighted UniFrac (C). All pvalues were calculated using the pairwise PERMANOVA test on rarefied data.

Figure 7. Percentage of reads belonging to unassigned ASVs at each taxonomic level.
Here we show the percent total of unnormalized abundance of reads for each ASV unassigned at that level. For example, a taxonomic assignment of 'Bacteria' would be counted as assigned at the Kingdom level but not for the Phylum level. No ASVs or OTUs were filtered from this calculation.

Figure 8. Analysis of Cumulative Sum Scaling on Differential Analysis.
Cumulative Sum Scaling (CSS) is a normalization method intended to correct for overamplification of sample specific sequences using an adaptive scaling term calculated around deviation from the reference sample [3].
Applied to our data we find that it significantly alters the distribution of feature scores, increasing weight of the lower abundance taxa relative to the abundance (A, B). The mean Pearson R^2 of unnormalized and CSS normalized ASV abundances: 0.503 Comparing performance of the methods in our differential analysis method (novembro.py) we find that they do have substantial overlap (~58%) (C). Normalization specific taxa do seem to have trends that describe them. The 6 TSS specific taxa have substantially higher unnormalized read abundance in the conserved site (minimum read count ~400) while also having one replicate closer in abundance to the impact replicates (D). As there are several dozen CSS specific taxa we randomly selected ten to show here. Of these we see that, with the exception of Desulfobacteraceae, these taxa are often not observed in the conserved samples and with low abundance in the impacted samples (maximum read count ~120) (E). Desulfobacteraceae, is an interesting case where the abundances in both samples are high. While CSS normalization increases this difference leading to it to be called significant, TSS decreases this difference leading to it being deemed not significant.
Because of this performance [4] we have decided to use TSS for normalization of our samples. Figure 9. Differential subcommunity analysis using Gneiss. Gneiss is a hierarchical clustering approach that uses balance calculations to extend differential abundance analysis beyond species and instead identify niche specific subcommunities [5]. Here (Figure 3), we show the gneiss dendrogram, top ten balances, and heat plot of the our taxonomically assigned family's unnormalized read counts.
We find two highly ranked clusters that correspond to enrichment in the Conserved samples (blue box, numerator of y0, entirety of y1) and enrichment in the Impacted samples (orange box, numerator of y2, entirety of y5).

Figure 10
. Overlap between Gneiss clusters and our differential abundance analysis method. We find 169 taxa are present in the Conserved site enriched cluster and 204 taxa are present in the Impacted site enriched cluster. Of these we find all (17) of our predicted Conserved enriched taxa are also present in the Conserved enriched cluster and all but 3 of our predicted Impact enriched taxa (43) are present in the Impact enriched cluster. Notably, those taxa (Bacteroidetes vadinHA17, Calditrichaceae, and Spirochaetaceae, Figure 4) which are absent from the Impacted enriched cluster are all taxa which also exhibit substantial abundances in the Conserved site, suggesting that they did not sort with this cluster due to that reason. Figure 11. Results of ANCOMI applied to Family level constrained taxa unnormalized abundances.