Functional analysis of (pristine) estuarine marine sediments

Traditional environmental monitoring techniques are well suited to resolving acute exposure effects but lack resolution in determining subtle shifts in ecosystem functions resulting from chronic exposure(s). Surveillance with sensitive omics-based technologies could bridge this gap but, to date, most omics-based environmental studies have focused on previously degraded environments, identifying key metabolic differences resulting from anthropogenic perturbations. Here, we apply ‘omic based approaches to pristine environments to establish blueprints of microbial functionality within healthy estuarine sediment communities. We collected surface sediments (n=50) from four pristine estuaries along the Western Cape York Peninsula of Far North Queensland, Australia. Sediment microbiomes were analyzed for 16S rRNA amplicon sequences, central carbon metabolism metabolites and associated secondary metabolites via untargeted metabolic proling methods. Multivariate statistical analyses indicated heterogeneity amongst the sampled estuaries, however, taxa-function relationships could be established and predicted community metabolism potential. Twenty-four correlated gene-metabolite pathways were identied and used to establish sediment microbial blueprints of carbon metabolism and amino acid biosynthesis. Our results establish a baseline microbial blueprint for the pristine sediment microbiome, one that drives important ecosystem services and to which future ecosurveillance monitoring can be compared.


Introduction
Traditional environmental and organism monitoring techniques (e.g., chemical monitoring and bioassays) are highly suited to assessing acute disruptive events [1] but are less able to detect subtle shifts in ecosystem function caused by chronic and/or sub-lethal drivers. In this regard, novel surveillance techniques and monitoring approaches that combine 'omics-based technologies with traditional methodologies are being developed [2][3][4][5]. This has predominantly been driven by research in the health and nutrition domain [6] but has found application within environmental science. For example, omicsbased techniques have advanced our understanding of microbial physiology [7], organism-environment interactions and organism function/health [4] based on the premise that metabolomics is effectively one step removed from an organism's phenotype [8]. This metabolomic-phenotypic link has further expanded our knowledge of speci c organismal responses to abiotic stressors [5] and for exploring both physical and anthropogenic contaminants [9][10][11]. Technologically, we are now entering an era with the potential for developing ecosystem-scale, multi-omic enabled blueprints for sensitive ecosurveillance strategies, strategies that could determine critical drivers of ecosystem functions within pristine and impacted environments. Critically, due to the substantial range of key ecosystem services that microbial communities provide globally, determining system wide impacts upon this component of the biosphere could e ciently provision key monitoring approaches for assessing ecosystem status and the impacts of environmental perturbation.
Microbial communities provide vital goods and services with far reaching consequences for critical ecosystem functions, with the ecological modulators of the rate of delivery of some of these services now being resolved (e.g., denitri cation) [12]. Further, microbial communities are highly dynamic and diverse assemblages, where members can become more or less important for the provision of these services over time [13] and with increasing community diversity thought to build resistance, resilience and functional redundancy into the services [12,14,15]. As such, the application of environmental microbial community metabolomics is one speci c approach being explored to assess large scale ecosystem function questions [16][17][18][19]. However, the consistency and predictability of un-perturbed microbiomes suggest they are under the regulation of the (meta)genetic blueprint [20], and thus, understanding the contributions of genetic factors to measured metabolic phenotypes, how these interact and are subsequently modulated by stressors, is also a key component to resolve. As stated by Bundy et al. [4], it is di cult to measure and interpret differing degrees of in uence that combine genetics and the impact the environment has on metabolism.
Despite some preliminary studies comparing genetic and site-speci c effects on the metabolic pro le of environmentally disturbed systems [16,17,21,22], there has yet to be a systematic analysis of how microbial community structure in uences the community metabolome of large ecosystems and a variety of metabolic functions. To date, the majority of environmental metabolomics research has focused on model and non-model organisms, typically the response of these organisms to an already perturbed ecosystem, with the detection of subtle shifts in ecosystem-wide functions remaining a challenge [23]. Further, it is yet to be resolved if the community metabolome can be used in routine monitoring programs as a measure of ecosystem function and health. This is typically due to limited metabolite data for integration with community composition (rRNA or metagenome) datasets, a lack of temporal application or, relying on statistical associations and anecdotal linkages to generate ecosystem function assessments. In many cases, 16S rRNA gene sequencing is principally used to pro le population diversity and the microbial functional capacity is then inferred as a secondary consideration from 16S rRNA gene sequencing data, with a degree of uncertainty [24]. There are surprisingly few studies that have validated this potential, or perceived, function with detailed metabolism data (i.e., metabolomics), nor overlaid and assessed how metabolomic pro les compare to phylogenetic differences and similarities (i.e., exploring what is different, but also what is the same) in non-perturbed systems [25].
In this study, bacterial 16S rRNA gene sequencing and a combination of targeted central carbon metabolism and untargeted secondary metabolite discovery approaches were integrated. Taxa-function relationships were established to assess the variance and similarities of metabolisms and bacterial community structures in four pristine estuaries during the dry season (winter) of the Western Cape York region, in northern tropical Australia. These integrated datasets enabled a better understanding of the underlying conditions that de ne the ecosystem health and resilience within these estuaries, through the construction of integrated 'microbial blueprints'. This information subsequently provisioned the baseline 'microbial blueprint' assessment of a highly valued and pristine environment, against which future developments and impacts could be measured.

Materials And Methods
Page 5/21 2.1 Study area -the Western Cape York Peninsula, Australia The Western Cape York Peninsula, Far North Queensland, Australia, is considered 'World Heritage Quality' [26]. The peninsula is a remote natural wilderness area in the monsoon tropical zone of northernmost Queensland, Australia ( Figure 1 and Supplementary Figure S1). The Queensland Government's State of the Environment report card considers the region to be in very good condition in terms of water quality and contaminants [27]. A detailed description of the study location is provided in the supplementary information, as required by the Metabolomics Standard Initiative reporting guidelines [28].

Sediment sample collection
Sediment samples were collected during the dry season (13 th -26 th of July, 2018), with no rainfall occurring during the month prior, and during the collection period [29]. At each site (n=50) a single sur cial sediment sample (top 2 cm) was obtained using a clean polycarbonate corer (diameter 10 cm).
Samples intended for 16S rRNA gene sequencing and metabolomics were then transferred into DNA-free, sterile 50 mL Greiner tubes and placed on ice immediately upon collection. The tubes were then placed into sterile snap-lock bags and stored at −20°C until transported to the laboratory. Upon arrival at the laboratory, samples were then stored at −80°C. All materials used for the collection and storage of DNA and metabolite samples were soaked for at least 24 h in 5% sodium hypochlorite and rinsed thoroughly (n=5) with Milli-Q water. The core sediment sample was subsampled for nutrients (0.5g), grain size (5.0g), pesticide (0.5g), and metal analysis (0.5g).

Physicochemical analyses
Sediment samples were analyzed for heavy metals and pesticides and characterized in terms of particle size. Heavy metals were extracted from a 0.5 g sub-sample of sediment, oven-dried, and extracted as previously described [30]. Pesticides were screened as per the Agilent Pesticide Personal Compound Database and Library (PCDL) method (G3878CA, Agilent Technologies, Santa Clara, CA, USA), without modi cations, using a sediment-based QuEChERS approach described by Bragança et al. [31]. Pesticides were analyzed using an Agilent 6546 liquid chromatography quadrupole time-of-ight mass spectrometer (LC-QToF) with an Agilent Jet Stream source coupled to an Agilent In nity II UHPLC system (Agilent Technologies, Santa Clara, CA, USA). Particle size analysis of the sediments was undertaken as previously described , with slight modi cations, using 5.0 g of sediment and per and AS 4816.1-2002 (R2016) [32] standards. All chemicals were of analytical grade or higher and were purchased from Sigma-Aldrich within Australia unless otherwise stated. FASTQ paired-end reads were merged using FLASH v1.2.1 [36], (min-overlap=30, max-overlap=250). Merged sequences that were deemed too short or too long (minlength=400, maxlength=520), had ambiguous base calls or homopolymer runs >8 were removed using 'screen.seqs' in Mothur v1.41.1 [37] and dereplicated using the fastx_uniques function in USEARCH v11 [38]. Dereplicated sequences were denoised and resolved into amplicon sequence variants (ASV's) using the UNOISE3 algorithm in USEARCH v11 [39] and default arguments. Merged and un ltered sequences were then mapped to these ASV's at 97% similarity to produce an ASV abundance table using the "otu_tab" function in USEARCH v11. Sequences were classi ed using the RDP naïve Bayesian classi er [40] using MOTHUR v1.41.1 [37] against the MOTHUR distribution of the Silva132 databases [41] respectively and a probability cut-off of 60%. Sequences not classi ed as "Bacteria", not classi ed at Phylum level, classi ed as "chloroplast" or "Mitochondria" were removed.
ASV data were further clustered at 97% sequence similarity, using USEARCH -cluster_fast on size (abundance) sorted ASVs, for prediction of the metagenomic function using PICRUSt2 [42]. The PICRUSt2 work ow was used with default arguments to make strati ed and non-strati ed metagenomic predictions. Samples were then bead-milled (Precellys Evolution, Thermo Fisher Scienti c, Australia) at 6,000 rpm, 3 x samples were centrifuged at 4,750 g for 30 min at 4˚C. The ACN fraction was then removed to a fresh tube and the extraction repeated with a further 2 mL cold ACN. The extracts were combined and ltered via Captiva EMR-Lipid cartridges to remove excess lipids (3 mL, 300 mg, Agilent, Mulgrave, Australia). Biological blanks were prepared in the same way, without biological material. Pooled biological quality control (PBQC) samples were prepared with 200 µL aliquots from each biological sample. Two 1 ml aliquots of each sample extract was then transferred into 1.5 mL high recovery glass vials and samples concentrated sequentially via rotational vacuum centrifuge (2,000 rpm, 38˚C, 24 h). Dried extracts were then reconstituted in 50 µL methanol/water (20:80, v/v) and vortexed for 5 min at room temperature before metabolomics analysis.

Central carbon metabolism metabolites (LC-QqQ-MS)
Central carbon metabolism metabolites were analyzed using an Agilent 6470 liquid chromatography triple quadrupole mass spectrometer (LC-QqQ-MS) coupled with an Agilent In nity II Flex ultra-highperformance liquid chromatography (UHPLC) system (Agilent Technologies, Santa Clara, CA, USA). The instrument was operated using the Agilent Metabolomics dMRM Database and Method . Collected data were processed using MassHunter Quantitative Analysis (for QQQ) software (Version 10.0, Agilent Technologies, USA), normalized to IS in preparation for downstream analyses.

Discovery polar metabolites (LC-QToF-MS)
Untargeted polar metabolites were analyzed using an Agilent 6546 liquid chromatography time-of-ight mass spectrometer (LC-QToF) with an Agilent Jet Stream source coupled to an Agilent In nity II UHPLC system (Agilent Technologies, USA). Chromatographic separation was achieved by injection (3 µL) of sample onto an Agilent In nityLab Proroshell 120 HILIC-Z Peek lined column (2.1 mm x 150 mm, 2.7 µm). Each sample was analyzed in positive and negative ionization mode. The mobile phase for the positive ion mode chromatography was (A) 10 mM ammonium formate in water with 0.1% formic acid and (B) 10 mM ammonium formate in acetonitrile/water (90:10, v:v) with 0.1 % formic acid operated for 20 minutes with a nonlinear gradient starting at 98% B. The column temperature was set at 25°C. The mobile phase for the negative ion mode chromatography was (A) 10 mM ammonium acetate in water with 2.5 µM medronic acid and (B) 10 mM ammonium acetate in acetonitrile/water (85:15, v:v) with 2.5 µM medronic acid operated for 26 minutes with a nonlinear gradient starting at 96% B. The column temperature was set at 50°C. The detector gas temperature was 225°C with a drying gas rate of 6 and 13 L min -1 for the positive and negative ion modes of operation. The sheath gas temperature and ow were 225°C and 10 L min -1 for positive ions, and 350°C and 12 L min -1 for negative ions. The nebulizer pressure was also 40 psi and 35 psi for positive and negative ions respectively. The acquisition range was 50 to 1600 m/z, at 3 spectrum per second. Reference mass ions were 922.009198 m/z (positive ion) and 68.9957 m/z and 980.02163 m/z (negative ion). Collected data were processed using MassHunter Pro nder software (Version 10.0, Agilent Technologies, USA), normisled to IS and putatively identi ed against the Agilent METLIN Metabolite PCDL (G6825-90008, Agilent Technologies, Santa Clara, CA, USA) based on MSMS spectra and library threshold score of 0.8.

Heavy metals in sediments (ICP-MS)
Trace heavy metals were analyzed using an Agilent 7700× quadrupole-type ICP-MS (Agilent Technologies, Mulgrave, Australia), equipped with an Agilent ASX-520 autosampler. Marine sediment samples (0.5 g) from each site were used for the analysis of trace heavy metals following the EPA Method 3051A [30]. Trace heavy metals were extracted, in triplicate, using concentrated nitric acid (9 mL) by microwaveassisted digestion (Multiwave 3000, PerkinElmer Inc., Melbourne, VIC, Australia). The instrument was operated in He-mode. The integration time was 0.3 s per mass, 1 point per mass, 3 replicates, and 100 sweeps per replicate. The Agilent Environmental standard for ICP-MS was used for quanti cation.

Statistical analysis and data integration
The trace metal and physicochemical data were used to assess the baseline condition of the sampled estuaries against the current Australian and New Zealand Guidelines for Fresh and Marine Water Quality in sediments. The grain size data were analyzed using GRADISTAT [43]. This was to account for any bias in the subsequent bacterial 16S sequencing and metabolomics analysis arising from differences in sediment physical characteristics. The bacterial 16S sequencing and metabolomics data were subjected to further statistical analysis using multivariate statistics. The data were rst imported, matched by sample location identi ers (metadata), and log-transformed to normalize the data using SIMCA 16 (MKS Data Analytics Solutions, Uméa, Sweden). Partial Least Square-Discriminant Analysis (PLS-DA) was performed by nding successive orthogonal components from the estuary-speci c datasets with maximum squared covariance and was subsequently used to identify the common relationships among the multiple datasets.
MetaboAnalyst 4.0 (Xia Lab, McGill University, Quebec, Canada) was used for differential metabolite abundances and pathway enrichment analysis and metabolites with Benjamini-Hochberg adjusted pvalue of ≤ 0.05 and, Fold Changes (FC) of < 0.5 (downward regulation) or > 2.0 (upward regulation), were considered to be statistically signi cant [44]. Chemical clusters based on structural similarity were created for metabolic examination using the ChemRICH analysis [45]. The identi ed metabolites were used to build metabolic pathways using the KEGG Mapper.
Automated taxonomic-to-phenotypic mapping was performed using Burrito [46] and MIMOSA2 [47], two metagenomics systems biology web tools. These tools integrate the predicted metagenomic function and metabolic data with the taxonomic composition of each sample. Alpha-diversity indices such as Shannon-Weiner's H', Simpson's D 1 , Simpson's dominance D 2 , Simpson's evenness E, and Pielou's evenness J were used to calculate the taxonomic diversity and functional diversity within each community. For diversity between communities, beta-diversity indices such as Jaccard's C J and Sorensen's C S were calculated. For multiple group comparisons (≥ 3 groups), signi cant features were identi ed using one-way ANOVA statistical analyses (p < 0.05) using a posthoc Dunn's signi cance test. Venn diagrams, presented in the supplementary section, were created using the InteractiVenn online web tool (http://www.interactivenn.net/) [48].

Results And Discussion
The sampled estuaries were assessed against the Australian and New Zealand Guidelines for Fresh and Marine Water Quality in sediments [49], using the collected physicochemical data (See supplementary information). Overall, the four estuaries sampled are considered healthy and below the trigger values for concern stated in the guidelines for tropical estuaries in Australia. This supports the most recent aquatic ecosystem report card published by the Queensland Government that stated the region as being "good quality" in terms of water quality, pollution, and introduced ora and fauna species.
It is noted that some water quality and sediment parameters were at, or above, the guideline trigger values, including chlorophyll-a, total nitrogen (TN), nitric oxide (NOx), total phosphorus (TP), and turbidity (Supplementary Tables S1-S2). Furthermore, measured concentrations of iron and aluminium were also high compared to the marine sediment guideline trigger values across the four estuaries (Supplementary  Table S1). However, this was to be expected based on the geological composition of the surrounding soil/substrate and, for the region, is considered naturally occurring. It should be noted that bauxite mining does occur out of the Embley-Hey estuary. Again, the concentrations of iron and aluminum were observed consistently high in all estuaries irrespective of localized mining activities. Sediment particle size analysis was also consistent for the region (Supplementary Figure S2 and Table S3). The distinct wet-dry seasonality of the semi-arid Cape York estuaries has been shown to trap and redistribute sediments over tidal and seasonal cycles, leading to an extreme, but highly variable, turbidity [50,51]. As such, turbidity above 20 NTU for all estuary sites was expected and may account for elevated chlorophyll-a and primary nutrients. The analyzed sediments were free from any pesticide residues screened as part of the Agilent Pesticide PCDL (data not shown).

Community metabolomics analysis
In total, 1681 discovery polar metabolite features were detected, of which 727 metabolite features were identi ed in the sediment samples collected from across four estuaries. Of all the identi ed metabolites, 53 central carbon metabolism metabolites and 674 discovery polar metabolites were identi ed against the METLIN metabolites and Agilent PCDL. Supplementary Figure S3 presents the PCA plots for the metabolomics data and provides a preliminary overview of sample/site clustering. The results indicate no clear discrimination between estuaries. The PCA model was found to be signi cant, with good linearity R 2 X (cum) = 0.639 and predictability Q 2 (cum) = 0.484. Outliers were examined using a distance of observation (DModX) analysis ; no metabolites were considered outliers at the two times DCrit threshold (p = 0.05) (Supplementary Figure S3). To explore potential for metabolic pathway activity levels in each estuarine system, the metabolomic data were subjected to pathway enrichment analysis. Pathway enrichment was used to decipher the metabolic pathways to which the metabolites are associated and resulted in 445 identi ed metabolites being mapped to the KEGG database. Pathway impact measured the relative signi cance of all metabolites of a single pathway for the overall spectrum of metabolites. Supplementary Figure S3 summarises the pathways mapped. We then performed a ChemRICH class annotation of the KEGG identi ed metabolites and found that 70 chemical clusters were represented.
These chemical groups can be associated with genes and phenotypes. The analysis highlighted saturated and unsaturated fatty acids (FA), glycosides, aminoglycosides, deoxyadenosines, glutamates, indole alkaloids, ketones, monoterpenes, pregnanediones, and purine nucleosides as the most abundant chemical clusters (Supplementary Figure S4). Figure 2 illustrates the chemical clusters of signi cance (p < 0.05) contributing to metabolic differences between the estuaries (Supplementary Figure S4).
Metabolites that differentiated individual estuaries were harmine (KEGG ID C06538; an indole alkaloid found in fruit and seeds , which was less abundant in the Skardon estuary), deoxyadenosine (C00559, a deoxyadenosines linked to marine microbial natural products [52], which was elevated in Skardon), deoxyguanosine (C00330; a purine nucleosides found in plants, and elevated in Skardon), undecylenic acid (C13910; an unsaturated FA elevated in Skardon that is a known antifungal drug and eukaryotic metabolite produced in plants), and a glycerol 1-phosphate derivative (C04590; a ketone found in marine sediments [53], and less abundant in Skardon). The Wenlock samples were found to have decreased levels of biphenyl (C06588; a deoxyadenosine found in some plants and animal tissues). Other signi cant key chemicals were fumigaclavine C (C20438; an indole alkaloid derived from marine-based fungus [54]), itaconic acid (C00490; a succinate), and L-serine (C00065; an unsaturated FA), all of which were slightly decreased. Deoxyloganin (C06071; a monoterpene) and oxoglutaric acid (C00026; a glutarate) were lower in the Embley-Hey catchment, likewise 4-hydroxy-L-glutamic acid (C0449; a glutamate) was lower in the Archer-Watson estuary. It should be noted that all the signi cant metabolites here were identi ed based on p-values alone and no metabolites were observed with changes greater than 1.5-fold across all the estuary sites. This is indicative of ecosystems that are considered similar in terms of their metabolic function at the time of sampling. The speci c pathways that are the same across all the estuaries are discussed in more detail below and form the basis for establishing microbial blueprints of the sampled estuaries.

Bacterial community analysis
Bacterial communities (with relative abundance ≥ 1%) from the four estuaries were taxonomically diverse, comprising 15 bacterial phyla. A total of 16,066 phylotypes (97% OTUs) across all four estuaries were obtained. The four estuaries shared 3,582 common phylotypes (Supplementary Figure S5).
Taxonomic pro ling demonstrated that the most abundant phyla (70.5 -71.6% relative abundance) were, Proteobacteria (mostly classes g-proteobacteria and a-proteobacteria; Embley-Hey > Skardon > Wenlock > Archer-Watson), Bacteroidota (mostly class Bacteroidia; Archer-Watson > Wenlock > Skardon > Embley-Hey), Chloro exi (mostly class Anaerolineae; Wenlock > Archer-Watson > Skardon > Embley-Hey) and Desulfobacterota (mostly classes Desulfobulbia and Desulfobacteria; Skardon > Embley-Hey > Archer-Watson > Wenlock) (Supplementary Figure S5). Members of Planctomycetota (4.5 -6.1%) and Acidobacteriota (3.9 -4.5%) were also found to have high relative abundances in all four estuaries. While the four estuaries share a largely similar community composition at the phylum level, a distinctive yellow microbial mat on the intertidal area in the upper and mid regions of the Wenlock and the Archer-Watson estuaries was observed. This is most likely due to an increased relative abundance of cyanobacteria in these estuaries. It is worth noting that elevated turbidity was found in these estuaries (Supplementary  Table S1), which might offer some competitive advantage to benthic cyanobacteria that are exposed to direct sunlight at low tide [55]. The presence of cyanobacteria, albeit in low relative abundances (2.3% in Wenlock and 3.5% in Archer-Watson), could explain the higher chlorophyll-a content in these samples.

A signi cant difference in OTU diversity indices (Shannon's diversity (H'), Simpson's diversity (D 1 ), and
Simpson's dominance (D 2 ); Supplementary Table S4) was detected among the four estuaries (Kruskal-Wallis; P < 0.05) (Supplementary Table S4). A signi cant difference in H', D 1 and D 2 , were observed between Archer-Watson and Skardon estuaries (Dunn's Test, P < 0.05). It should be noted that these estuaries are the most distant geographically and may also be the most different in terms of watershed characteristics and wind/tidal forcing. This was also illustrated by lower beta-diversity indices (C J and C s , Supplementary Table S4).

Overall functional gene diversity and structure of microbial communities
The predicted functional pro ling of the estuarine microbiome datasets showed the presence of 7,303 KEGG IDs in the four estuaries. The most active pathways are indicated in Supplementary Figure S6. We observed a highly similar functional repertoire of the estuarine microbiome present across the four estuaries, despite the differences in their overall taxonomic pro le composition. This was also illustrated by the OPLS-DA statistics of the metagenome dataset (Supplementary Figure S7).
We did however observe a signi cant difference in predicted gene diversity indices (H', D 1 and D 2 ) of microbial communities among the four estuaries (Kruskal-Wallis; P < 0.05) (Supplementary Table S5). We also observed a signi cant difference between Archer-Watson and each of the other three estuaries for Shannon's diversity index (H', Dunn's test, P < 0.05). A low Jaccard's and Sorensen's index also indicated that there was only a small dis-similarity between these estuaries (Supplementary Table S5).

Carbon metabolism and biosynthesis of amino acids in estuarine sediments
As a result of the signi cant similarities observed between these geographically distinct estuaries, the estuaries were further analyzed in terms of characterizing the microbial blueprints important for microbially-mediated processes. As such, all identi ed metabolites were integrated with 317 functional genes predicted from the 16S rRNA gene sequencing data that are involved in carbon, nitrogen, and sulphur cycles, amino acid metabolism and metal homeostasis and resistance. Figure 3 maps all the detected and identi ed metabolites with the predicted functional genes. A high relative abundance of enzymes involved in carbon metabolism was also predicted based on the 16S rRNA gene sequences (Figure 4). Central carbon metabolism includes the Embden-Meyerhof-Parnas (EMP) pathway of glycolysis, the pentose phosphate pathway (PPP), the citric acid cycle (or TCA cycle), six known carbon xation pathways, and some pathways of methane metabolism [56].
High relative abundances of TCA associated enzymes were predicted in the sediments based on the predicted functional gene data. Iron (Fe), present at elevated concentrations across all four estuaries (61.4 -95.2 mg kg-1), is known to modulate the expression of critical enzymes of the TCA cycle including aconitase (ACO), citrate synthase (CS), isocitric dehydrogenase (IDH1), and succinate dehydrogenase (SDHB) [54]. Fe also increases the formation of reducing equivalents such as NADH by the TCA cycle and thus increased mitochondrial O 2 consumption and ATP formation via oxidative phosphorylation; NAD+, ADP, and ATP were all detected in the current study and demonstrate the importance of the TCA cycle here for establishing a microbial blueprint. Several genes such as sdhB and frdB that encode enzyme iron-sulphur subunits with a role in oxidative phosphorylation were predicted from the gene sequence data. Furthermore, increased Fe concentrations can also cause repression of glycolysis [57]. This supports the negligible relative abundance of genes involved in glycolysis being predicted in the sampled sediments.
A very high abundance of the genes acnA (encoding aconitate hydratase), sdhB (encoding succinate dehydrogenase) and frdB (encoding fumarate reductase) indicate enhanced reductive citric acid cycle (also called the Arnon-Buchanan cycle) activity ( Figure 3). The reductive citric acid cycle is found in microaerophiles and anaerobes, such as green sulphur bacteria belonging to the Bacteroidota phylum. In the current study, members of the Bacteroidota, phylum were suggested to contribute towards carbon xation in the metagenome of all estuaries (Supplementary Figure S8).
The reductive acetyl-CoA pathway (also called the Wood-Ljungdahl pathway) is found in strictly anaerobic Proteobacteria and Planctomycetes, some of which are methane-forming. The most important bifunctional enzymes, carbon monoxide dehydrogenase (encoded by cooS) and acetyl-CoA synthase (encoded by cdhE), catalyse CO 2 1 CO and CO 2 to a methyl group conversion, respectively. In the current study, members of Proteobacteria phylum were suggested to contribute towards methane metabolism in the metagenome of all estuaries (Supplementary Figure S9).
Noticeably, high relative levels of genes such as accABCD, which encode various subunits of acetyl-CoA carboxylase were predicted in this study (Figure 3). These results indicate that the predicted metagenome xes CO 2 by the 3-hydroxypropionate bicycle pathway commonly found in prokaryotes. This pathway is found in some green non-sulphur bacteria of the phylum Chloro exi. Taxa-functional analysis of the predicted metagenomes also indicated that members of Chloro exi substantially contributed to this pathway (Supplementary Figure S10).
Pyruvate branches out to form alanine, isoleucine, lysine, leucine, and valine. Isoleucine and leucine were identi ed in the sediments while several genes encoding enzymes catalysing these conversions were predicted in the sediments (Figures 3 and 5). Leucine is also produced from acetyl-CoA. Several amino acids including arginine, proline, glutamine, and glutamate are produced from a-ketoglutarate and were identi ed in the sediments. A very high relative proportion of genes involved in the biosynthesis of various amino acids was predicted ( Figure 5). Members of the phyla Bacteriodota, Chloro exi, and Proteobacteria were identi ed as contributing to the biosynthesis of amino acids in the sediment samples ( Supplementary Figures S8-S10).
Community-wide biosynthetic and degradation potential for each metabolite in each sample was predicted through an integrated 16S rRNA gene sequencing-metabolomics approach using the MIMOSA2 webtool. Metabolites whose variation is consistent with expectations based on well-predicted microbial metabolic potential were identi ed. From these data, a community-wide metabolic model was constructed using MIMOSA2 for each sample and a community metabolic potential (CMP) was calculated, representing the relative capacity of the predicted community enzyme content in that sample to synthesize or degrade each metabolite based on metabolic reference information that links gene predictions to their substrates and products (metabolites) [47]. Twenty-four metabolites with predicted CMPs were identi ed; however, none of these metabolites were found to be signi cant using the multivariate statistical approaches described above. Figure 6 illustrates the top 5 positively correlated CMP metabolites. Higher positive CMP scores indicate metabolites for which this agreement is statistically "well-predicted." Negatively predicted metabolites can be interpreted in several different ways. Possible reasons for a negative correlation between a metabolite's levels and its metabolic potential include incorrectly annotated or missing reactions (i.e., missing metabolites within the identi ed gene-metabolite pathway). However, contributors identi ed for negatively correlated metabolites have the potential to represent true relationships but should be interpreted more cautiously than positively correlated metabolites. The top 5 negatively correlated CMP metabolites are presented in Supplementary Figure S11.

Metal homeostasis and metal resistance in estuarine sediments
Our results indicated very high concentrations of Fe (61.4 -95.2 mg kg -1 ) and Al (45.5 -68.2 mg kg -1 ) in all estuarine sediments. Some metals, such as Cu, Fe, Ni, and Zn, are essential nutrients and play important roles in the regulation of gene expression, and the activity of biomolecules, including enzymes and cofactors for essential biochemical reactions [59]. Other metals, such as Al, Cd, Pb, As and Hg, have no biological role and are considered nonessential [60]. Unsurprisingly, 125 KEGG genes related to metal homeostasis and metal resistance. These results are summarized in Supplementary Figure S12 and indicate the presence of a very high proportion of functional genes related to Fe and Zn metabolism, mostly related to membrane transport functions. The role of Fe in transport systems for Al uptake has been previously reported by Auger, Han [61]. Relatively higher proportions of Fe-related genes may reduce the toxic effects of Al found in the sediments. There were no metabolites detected and identi ed that related to these speci c genes, however, additional metallomics-based approaches could be applied to overcome this analytical limitation (i.e., assessing metal e ux in marine sediment microbiomes) [62].

Sulphur metabolism in estuarine sediments
High relative proportions of functional genes such as dsrAB (encoding dissimilatory sulphite reductase), aprAB (encoding adenylylsulphate reductase) and sat (encoding sulphate adenylyl transferase) were predicted in the metagenomes from all four estuaries (Supplementary Figure S13). SO 4 2à S 2is driven by the oxidation of organic carbon, supplemented by the anaerobic oxidation of methane (CH 4 ) at the subsurface SO 4 2--CH 4 transition [63]. Most of the S 2is ultimately re-oxidized back to SO 4 2-, via diverse sulphur intermediates, by geochemical or microbial reactions [63]. The assimilatory pathway produces reduced sulphur compounds for the biosynthesis of S-containing amino acids such as cysteine.
A high proportion of genes (Supplementary Figure S13) involved in sulphur oxidation such as soxAX (encoding L-cysteine S-thiosulfotransferase), soxB (encoding S-sulfosulfanyl-L-cysteine sulfohydrolase), soxC (encoding sulfane dehydrogenase), and soxYZ (encoding sulphur-oxidizing protein) were predicted in the current study. The large amounts of Fe in the estuarine sediments could increase the sulphur oxidation. Fe acts as an oxidant for S 2in the deeper sediment layers where it partly binds to S 2to form iron sulphide (FeS) and pyrite (FeS 2 ).
The sulphur cycle of marine/estuarine sediments is largely microbial-driven, via dissimilatory sulphate (SO 4 2-) reduction to sulphide (S 2-) [64]. Members of the Proteobacteria (mainly classes Alphaproteobacteria and Gammaproteobacteria) and Desulfobacterota (class Desulfobacteria) phyla contribute the most to the sulphur metabolism (Supplementary Figure S13). Members of Proteobacteria have been previously shown to contribute considerably to Fe cycling [65]. While the metabolomics data did not provide any metabolites that directly feed into these pathways, characterizing the relevant proteins that drive these processes may enable greater insight into these functions and curation of microbial blueprints of activity.

Nitrogen metabolism in estuarine sediments
Several key functional genes (Supplementary Figure S14) related to nitrogen metabolism were predicted in all four estuaries, including genes (in decreasing order of relative proportion) involved in dissimilatory nitrate (NO 3 -) reduction to ammonium (NH 4 + ) (DNRA), denitri cation, assimilatory NO 3 reduction to NH 4 + (ANRA), nitri cation, and nitrogen xation. However, the inorganic metabolomics related to these processes is not measurable by the LC-MS metabolomics-based approaches used herein. Nitrogen cycling through the environment is in uenced by microbially-driven processes [66]. The denitri cation and DNRA processes are mediated by the products of several genes illustrated in Supplementary Figure S14. Predicted functional gene analysis indicated the likely presence of these genes in all four estuaries. Taxafunctional analysis indicated that nitrogen metabolism is mediated by a diverse polyphyletic group of bacteria. The highest contribution to nitrogen cycling genes was attributed to Proteobacteria, Desulfobacterota, and Bacteroidota (Supplementary Figures S8-S9, S13).
Denitri cation is one of the major nitrogen loss pathways that uses multiple electron donors such as hydrogen gas, hydrogen sulphide, or other organic compounds. Microorganisms can grow using DNRA, the key reaction of which is NO 3 à NH 4 + , by coupling it to the oxidation of electron donors, such as organic matter, methane, hydrogen, sulphur compounds, and iron. The environmental importance of DNRA is not well established ; however, DNRA appears to be favoured over denitri cation in marine and lake sediments when there is an excess of electron donors relative to NO 3 - [67]. The results of this study agree with this observation (Supplementary Figure S14). High DNRA in estuarine samples indicates that nitrogen is retained in the system. This is evident from amino acid production discussed earlier.
The competition for nitrogen in estuaries is mostly for the inorganic forms of ions, commonly NH 4 + or NO 3 -. NH 4 + is the reduced form of nitrogen and, therefore, requires less energy to build amino acids. As such, NH 4 + is preferred over NO 3 and makes the competition for NH 4 + erce in estuaries.
The assimilation of NH 4 + in prokaryotes occurs mainly via ammonium transporter proteins. Primary products of assimilated ammonia are glutamine and glutamate, which constitute the central reservoir of nitrogen for many biosynthetic pathways. The rst pathway involves a high-a nity ammonium transporter, encoded by amtB, that moves NH 4 + into the cell, where NH 4 + + glutamate à glutamine by glutamine synthetase [68]. The functional gene amtB was predicted in all the four estuaries indicating an assimilatory process in the studied environments. The other mechanism involves a low-a nity transport system in which NH 3 is passively (diffusion) transported into the cell and NH4+ + α-ketoglutarate à glutamate by glutamate dehydrogenase [68]. Glutamate, α-ketoglutarate, and glutamine were identi ed in the estuarine samples. Glutamate is the principal source of nitrogen for the production of N-amine and is involved in transamination reactions at the core of amino acid metabolism.

Conclusion
The use of omics-based surveillance techniques (particularly metabolomics) for environmental science has grown substantially in the last twenty to thirty years. Indeed, environmental toxicology was one of the major applications of the very earliest metabolomics studies [69]. Today scientists are moving away from individual 'omic approaches towards a more holistic, systems-based approach for environmental assessment.
The major limitations of research in environmental metabolomics, and indeed much of environmental science to date, has been the major focus on already perturbed systems and looking at what is different -not what is the same. Indeed, it is very hard to nd an unperturbed system to develop a baseline against which impacts can be measured. In this study, the approaches normally used to look at changes in systems were used, for the rst time, to establish a microbial blueprint for a "pristine" system during the dry season which was combined with a range of metadata. These data not only inform the story of these estuaries today, but also establishes a baseline against which future monitoring can be assessed. This is especially important for the studied estuaries herein, given the planned development of Northern Australia.
The systems-based approach also allowed us to tease out important relationships, for example, the increase in the 125 KEGG genes related to metal homeostasis and metal resistance observed in the estuaries studied. There were no metabolites detected and identi ed that related to these speci c genes so they may have been missed with a metabolomics-based approach. Conversely only using genomics would have omitted metabolites that were positively correlated with community metabolic potential and function (including 2-oxisocapraote, tryptophan, histidine citrulline and succinic acid).
This study provides an important rst step towards the use of a multi-omics microbial blueprint approach, which could be used to monitor more estuaries and sediment communities around the world. Further expansion to include planktonic and sessile bacterial populations, in addition to capturing seasonal variance, would enable a truly dynamic and integrative microbial blueprint to be established.
Such an estuarine and sediment microbial blueprint would be relevant to ecological investigations measuring the impact of a range of factors, such as climate change, contaminants, disease, food restriction, infection, parasite load, and biogeochemical cycles.

Acknowledgement
The 1290 In nity II Flex pump and 6470 LC-QqQ-MS instruments used in this study were provided by Agilent Technologies to CSIRO. All images and gures were created or modi ed using BioRender.com using data outputs from Shiny Burrito and Shiny MIMOSA2.