Physicochemical Drivers of Microbial Community Structure in Sediments of Lake Hazen, Nunavut, Canada

The Arctic is undergoing rapid environmental change, potentially affecting the physicochemical constraints of microbial communities that play a large role in both carbon and nutrient cycling in lacustrine environments. However, the microbial communities in such Arctic environments have seldom been studied, and the drivers of their composition are poorly characterized. To address these gaps, we surveyed the biologically active surface sediments in Lake Hazen, the largest lake by volume north of the Arctic Circle, and a small lake and shoreline pond in its watershed. High-throughput amplicon sequencing of the 16S rRNA gene uncovered a community dominated by Proteobacteria, Bacteroidetes, and Chloroflexi, similar to those found in other cold and oligotrophic lake sediments. We also show that the microbial community structure in this Arctic polar desert is shaped by pH and redox gradients. This study lays the groundwork for predicting how sediment microbial communities in the Arctic could respond as climate change proceeds to alter their physicochemical constraints.


INTRODUCTION
While human-induced climate change is a global reality, its effects are amplified in the Arctic, severely impacting freshwater ecosystems there. Indeed, increases in air temperature and precipitation lead to enhanced glacial melt and runoff (Bliss et al., 2014), permafrost thaw (Mueller et al., 2009), and a reduction in ice-cover duration (Vincent and Laybourn-Parry, 2008). In response to these changes, High Arctic lakes can undergo shifts in their temperature, light and nutrient availability, pH, and salinity (Mueller et al., 2009;Lehnherr et al., 2018). Changes in these abiotic factors can be expected to influence the structure of microbial communities which, in turn, can then affect their physicochemical environment, for example through nitrogen fixation, organic carbon mineralization, or sulfate reduction. However, the microbial communities inhabiting polar lake sediments are still poorly characterized, and what drives community composition is relatively unknown. Although over the past few years several studies taking place in the polar regions have used next-generation sequencing to characterize microbial communities (Stoeva et al., 2014;Emerson et al., 2015;Hauptmann et al., 2016;Schütte et al., 2016;Wang et al., 2016;Mohit et al., 2017;Thaler et al., 2017), data on sediment microbial communities in these environments is still sparse. The available data are also biased toward small lakes and thaw ponds, thus underrepresenting large arctic lakes.
To predict how environmental changes might impact future freshwater quality and productivity in the Arctic, we first need to understand the structure of the microbial communities that are mediating the biogeochemical cycles in these environments. This is usually achieved by PCR amplicon sequencing of the 16S rRNA gene, which is commonly used as a phylogenetic marker gene for bacteria and archaea. To move beyond the structural description of a microbial community, we need to understand (i) how the environment is shaping a community, and (ii) how a community, in turn, shapes its environment. Metrics describing microbial community structure can be correlated with physicochemical variables using multivariate methods, such as Non-metric MultiDimensional Scaling (NMDS), (un-)constrained correspondence analysis, or cluster analysis ( Buttigieg and Ramette, 2014). However, most of these approaches remain descriptive, and assume that the relationships between community composition and abiotic factors are linear. To address these limitations, machine learning methods have been used, for instance to predict disease progression from human gut microbiomes (Pasolli et al., 2016), to determine the factors affecting microbial diversity in soil (Ge et al., 2008), or to show that pH controls microbial diversity in acid mine drainage (Kuang et al., 2013). More recently, Beall et al. (2016) identified Operational Taxonomic Units (OTUs) with different abundances between high and low ice conditions in lakes, while Sun et al. (2017) predicted that very low levels of antimony [Sb(V)] and arsenic [As(V)] increase microbial diversity in soils. However, such machine learning approaches have not yet been used to characterize the drivers of microbial diversity in Arctic freshwater environments. Without a full metagenomics or metatranscriptomics dataset, it is difficult to properly describe a functional link between community structure and function. When such data are unavailable, studies have suggested that amplicon-based sequencing data be used to make limited functional predictions of environmental microbial communities (Louca et al., 2016b). This type of functional prediction relies on the presence of taxa known to participate in well characterized biological processes or functions (e.g., oxygenic photoautotrophy, sulfate reduction, and methanogenesis; Langille et al., 2013;Aßhauer et al., 2015;Louca et al., 2016b) but has yet to be applied to undersampled and / or extreme environments such as high arctic lake sediments.
Here, we characterized over a period of two years the microbial community structure in sediments collected from freshwater systems in the Lake Hazen watershed, located in Quttinirpaaq National Park on northern Ellesmere Island, in Nunavut, Canada (82 • N, 71 • W; Figure S1). Bacterial and archaeal 16S rRNA gene amplicon sequencing from environmental DNA samples allowed us to characterize the microbial communities across space and time. Taking advantage of recent developments in machine learning, we determined the physicochemical drivers of the community structures, and use functional mapping of the community structure (Louca et al., 2016b) to make predictions about the sediment microbial communities.

Collection of Sediment Cores and Associated Chemistry
The Lake Hazen watershed is a polar oasis with temperatures higher than usually found at similar latitudes (Keatley et al., 2007) due to the influence of the Grant Land mountains in the northwest. Sediment cores were collected from three water bodies within the watershed: Lake Hazen itself, Pond1, and Skeleton Lake ( Figure S1). Lake Hazen (74 km long, up to 12 km wide, area 54,200 ha, max. depth of 267 m; Figure S2a) is the world's largest lake by volume north of the Arctic Circle. It is primarily fed by runoff from the outlet glaciers of the Grant Land Ice Cap and drained by the Ruggles River to the northeastern coast of Ellesmere Island. Lake Hazen has a relatively stable year-round water temperature of ∼3 • C (Reist et al., 1995), is fully ice covered in the winter ( Latifovic and Pouliot, 2007), and is ultra-oligotrophic (Keatley et al., 2007). Lake Hazen is monomictic, mixing fully in the summer partially influenced by turbidity currents originating from the glacial inflows (Lehnherr et al., 2018). A slight reverse temperature stratification (i.e., lower temperatures right below the ice) develops during the winter. The surface sediments of Lake Hazen are soft silts, with a total organic carbon content between 3.1 and 8.3%. The bathymetry and geochemistry of Lake Hazen have been thoroughly characterized in Köck et al. (2012). While large lakes like Lake Hazen are rare in the Arctic, small lakes and shallow ponds are a characteristic feature of the Arctic landscape. Skeleton Lake (1.9 ha, max. depth 4.7 m; Figure S2b) is fed by permafrost thaw waters, and subsequently drains through two ponds, a wetland, and a small creek before flowing into Lake Hazen (Emmerton et al., 2016). Pond1 (0.1-0.7 ha, max. depth 0.5-1.3 m; Figures S2c,d) is located along the northwestern shore of Lake Hazen. In high glacial runoff years, Pond1 may become hydrologically connected to Lake Hazen as water levels rise (Emmerton et al., 2016; Figure S2d). The organic carbon content of the sediments in Pond1 ranges from 7.0 to 10.4% and in Skeleton Lake from 13.0 to 35.1%. Skeleton Lake and Pond1 are fairly productive in the summer with photosynthesis by macrophytes, mosses, and algal mats that cover the sediments ( Figure S2c), despite their low chlorophyll a concentration (Keatley et al., 2007;Lehnherr et al., 2012). Some of the productivity in Skeleton Lake and Pond1 might also be driven by carbon and nutrients originating from fecal matter of birds as both sites are important nesting habitats. In the summer, their water temperature can rise to 19 • C, but in the winter, ice cover reaches to the bottom in Pond1 and shallower (<2 m) parts of Skeleton Lake. The water columns of both Skeleton Lake and Pond1 are depleted of O 2 during the winter because of heterotrophic activity.  (Figures S1,S3c,d). In spring, all sites were covered with just less than 2 m of snow-covered ice; in summer, samples were collected during open water (ice-free) conditions. At each site, three intact replicate cores were collected for DNA extraction, and determination of physicochemical profiles and of porewater chemistry. All sediments were collected either with an UWITEC (Mondsee, Austria) gravity corer (deep sites), or manually (shallow sites in Pond1 and Skeleton Lake) into 86 mm inner diameter polyvinyl chloride core tubes. Due to logistical constraints, only a single core was available for DNA extraction from each time and site. Cores for DNA extraction were sectioned in 0.25 cm (spring 2014) or 0.50 cm (summer 2015) intervals immediately after sampling, preserved in Invitrogen TM RNAlater TM (Thermo Fisher Scientific Inc., Waltham, MA, USA), and stored at −18 • C before DNA extraction. Contamination of samples was minimized by cleaning the sectioning equipment between each section and wearing non-powdered latex gloves during sample handling. In spring 2015, whole cores were frozen directly after sampling at −18 • C, transported back to the University of Ottawa, and sectioned at 1 cm intervals while frozen. Surfaces of the sections in contact with the nonsterile sectioning equipment were scraped clean with bleachsterilized tools in a laminar flow hood (HEPA 100) before subsampling from the middle of the sections. Redox potential, pH, [H 2 S], and dissolved [O 2 ] profiles were measured at 100 µm intervals in the field within an hour of collection, using Unisense (Aarhus, Denmark) microsensors connected to the Unisense Field Multimeter (Tables S1, S2). Redox  ] were also measured in sediment porewaters by ion chromatography (Table S2). Cores used for analyses of porewater chemistry were sectioned at 1 cm intervals into 50 ml falcon tubes in the field, followed by flushing of any headspace with UHP N 2 before capping. Tubes were then centrifuged at 4,000 rpm, after which the supernatant was filtered through 0.45 µm cellulose nitrate filters into 15 ml tubes, which were then frozen until analysis at the Elemental Analysis and Stable Isotope Ratio Mass Spectrometry Laboratory (Department of Renewable Resources, University of Alberta). Concentrations for H 2 S were set to 0 where it was not detected with the microsensors. For the three lowest horizons in the Skeleton Lake 2015 core, [H 2 S] was input as the value measured at the deepest sediment depth before the microsensor broke (169.8 mgL −1 ), as a conservative estimate since its oxidation in the completely anaerobic sediments was likely minimal. When several measurements were made over the sectioning depth used for DNA extraction, concentration readings were averaged. Hereafter, "sediment depth" refers to the lower sediment depth of each sample, measured down from the sediment-water interface. Principal Components Analysis (PCA) was employed to visualize physicochemical differences and relatedness between the different coring sites. For this, the autoplot function from the R package ggfortify 0.4.1 (Horikoshi and Tang, 2017) was used.

Sequencing and Data Preprocessing
Upon returning to the University of Ottawa, samples for DNA extraction were homogenized, divided into duplicate 250 mg (WW) subsamples, and washed with a buffer (10 mM EDTA, 50 mM Tris-HCl, 50 mM Na 2 HPO 4 ·7H 2 O at pH 8.0) to remove PCR inhibitors (Zhou et al., 1996;Poulain et al., 2015). DNA was extracted from the duplicate subsamples with PowerSoil R DNA Isolation Kit (MO BIO Laboratories Inc, Carlsbad, CA, USA), and then the duplicate extracts were combined. The 16S rRNA gene fragment was amplified with universal primers in the spring 2014/2015 samples, and primer sets specific to either Bacteria or Archaea in the summer 2015 samples (for details, see SI text). The extraction kit elution buffer was used as a negative control in screening experiments. Sequencing was completed with Illumina MiSeq using paired-end 300 bp reads at Molecular Research LP (Shallowater, TX, USA; for details, see SI text). Sequencing of a single sample per sediment depth per core was deemed sufficient, since no pairwise comparisons of individual samples were conducted in the data analysis. All handling of the samples was conducted in a laminar flow hood (HEPA 100) stainless steel sterile cabinet that was treated with UVC radiation and bleach before each use.

Data Analyses
The number of sequences was tracked throughout each step of the pipeline for quality control (Table S3). The taxonomy of OTUs with >99% sequence identity to the SILVA 128 database was refined to the closest matching entry to facilitate functional mapping. OTUs with ambiguous, mitochondrial, or plastid assignments were removed with phyloseq 1.20.0 ( McMurdie and Holmes, 2013). Negative controls were not sequenced for this study and, as such, we were not able to directly remove possible contamination brought by the DNA extraction kit. Although studies with low microbial biomass (e.g., blood, lungs, dry surfaces) are expected to be more sensitive to contaminants (Salter et al., 2014;Glassing et al., 2016), we tested the impact of possible contaminants by identifying and removing putative contaminating genera from our samples (see SI text; Figure S4). We compared the unmodified data to analyses where we removed 100% of known contaminants from MOBIO PowerSoil DNA extraction kits (Glassing et al., 2016), the kit used in our study. The result of our comparative analyses showed (i) no changes in alpha diversity analyses, (ii) few changes in the clustering analyses ( Figure S5) and (iii) no changes in the ordination analyses ( Figure S6), leaving our conclusions unaffected in all cases. Note that e.g., 5 of the 10 most abundant contaminants (Veillonella, Methylobacterium, Prevotella, Tumebacillus, and Oxalobacter) were not found in our samples. In addition to known contaminants from the MOBIO kit used here and described in the main text, we also tested for known contaminants from four additional DNA extraction kits (Salter et al., 2014) that were not used in our study (see SI text; Figures S4-S6). However, as the putative contaminant genera could plausibly be part of the sediment community and the identity of true contaminants are not known, they were not removed from the data. To visually estimate the sequencing depth in our samples, rarefaction curves were constructed from non-normalized data with singleton OTUs included ( Figure S7). To assess the functional potential of the communities based only on 16S rRNA gene amplicon data, the normalized and curated OTU abundances were mapped to phylogenetically conserved functional groups in a customized database using FAPROTAX 1.0 (see SI text; Figures S8, S9; Louca et al., 2016b). Briefly, the predictions made by FAPROTAX are based on references from the literature, and work by mapping OTUs (at any given taxonomic level) to functional groups. The associations are based solely on cultured strains, so that an association between a taxonomic level and a functional group is only made if all representatives at that taxonomic level display the particular function. The total DNA extracted from sediments does not solely represent the metabolically active part of the community, as DNA from both dormant and dead organisms is usually co-extracted (Klein, 2007;Carini et al., 2017;Lennon et al., in review). Thus, without transcriptomic or proteomic data our functional predictions should be considered hypothetical.
To assess the biological significance of phylogenetic characterizations, samples were analyzed based on two levels of diversity: within and among samples. First, we investigated trends in alpha-diversity (within-sample diversity). Because the contribution of individual taxa to ecosystem processes is likely dependent on their abundance (Nemergut et al., 2013), we chose Simpson's dominance (Morris et al., 2014) as the metric for alpha-diversity. Simpson's dominance is robust to both spurious OTUs and variations in sampling depth between sequencing runs ( Pinto and Raskin, 2012). The sequencing depth in our samples (Good's coverage: 76.0-97.2%; Figure S10) suffices to accurately estimate alpha-diversity (Lundin et al., 2012). To enable comparisons of alpha-diversity and sequencing coverage to other studies, we also calculated Chao1 and Shannon indices, and Good's coverage ( Figure S10). All this was done based on ten randomized rarefactions of the raw OTU counts with the R package phyloseq 1.20.0 ( McMurdie and Holmes, 2013). The relationships between alpha-diversity and its predictors (sample categories or physicochemical variables) were determined with random forests (Breiman, 2001; Liaw and Wiener, 2002). The forests were grown to 5,000 trees, using the R package ranger 0.7.0 ( Wright and Ziegler, 2015). Selection of the most important predictors was based on the Gini index by adding predictors one by one in order of decreasing importance (Menze et al., 2009). The best and most parsimonious model was then selected by minimizing the Model Standard Prediction Error (MSPE) for regression random forests (in the case of continuous predictors), or by maximizing Cohen's Kappa for classification random forests (in the case of categorical predictors). The relationships between the most important predictors and Simpson's dominance were estimated with partial dependence plots of the best models with the R package edarf 1.1.1 ( Jones and Linder, 2016), which display how model prediction changes as a function of each predictor, while other predictors are fixed to their average value. Thus, each variable's effect on the model prediction is considered independently, and each predictor's relative effect size can be estimated from the variability displayed by the model prediction.
Second, in terms of beta-diversity (between-samples diversity), phylogenetic distances between pairs of samples were calculated with a Double Principle Coordinate Analysis (DPCoA; Pavoine et al., 2004), using OTU abundances and patristic distances estimated from the maximum likelihood tree. The phylogenetic data were limited to OTUs with >0.01% overall abundance, because of the quadratic increase in runtime per added OTU in DPCoA (Fukuyama et al., 2012). The Bray-Curtis distances between samples were calculated from group abundances in the functional predictions. A Mantel test was used to test for differences between sample physicochemistry, and either their phylogenetic or functionally predicted group distances. Phylogenetic data and functional predictions from spring 2014/2015 were then clustered using the t-distributed stochastic neighbor embedding (tSNE) algorithm (van der Maaten and Hinton, 2008), implemented in the R package rtsne 0.13 ( Krijthe and van der Maaten, 2017), with "perplexity" set to 5. Clusters were identified with the HDBSCAN algorithm (Campello et al., 2013), in the package dbscan 1.1.1 (Hahsler et al., 2017), with "minPts" set to 3. Regression and classification random forests were used together with partial dependence plots to identify the most important physicochemical and categorical variables for the clustering patterns, as described above. Two different subsets of the phylogenetic data were analyzed: the most abundant OTUs (>0.01% abundance) and the dataset limited to OTUs that matched at least one group in the FAPROTAX database. Summer 2015 data were not included in the tSNE analyses because only a single core per lake was available.
The correlations between categorical and continuous variables to beta-diversity were assessed by unconstrained correspondence analysis with "envfit" from the R package vegan 2.4.3 (Oksanen et al., 2016). The variables were fit on NMDS ordinated distance matrices (described above) for both phylogenetic data and functional predictions, and the statistical significance was assessed with 10,000 permutations. For the continuous physicochemical variables, non-linear relationships were analyzed with "ordisurf, " which is based on surface fits, contra vector fits in envfit (Figure 6, Figures S12-S14). All P-values were Bonferroni-corrected per data set. Random forests (described above) were further used to corroborate these analyses. In the phylogenetic data, OTU abundances were grouped at phylum, class, and order levels for the random forest models, which were all screened for the best and most parsimonious model. Lower (i.e., more exclusive) taxonomic levels were disregarded to increase the ecological meaningfulness of the results (Xu et al., 2014). Partial dependence plots were again generated with the R package edarf to examine the relationships between the most important OTUs and their functionally predicted group abundances to each sample category, spatial, and environmental variable. All these data analyses were done with R 3.4.0 (R Core Team, 2017); the corresponding scripts can be accessed through GitHub (https://github.com/Begia/Hazen16S), the sequencing data can be retrieved from the NCBI Sequence Read Archive (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA430127), and the geochemical data from the National Centers for Environmental Information online repository (http://accession. nodc.noaa.gov/0171496).

Sediment Microbial Communities Are Similar to Other Arctic Lakes
Microbial community structure of Lake Hazen and Skeleton Lake sediments in spring 2014/2015 exhibited similarities to other lake sediments in polar (Tang et al., 2013;Wang et al., 2016;Mohit et al., 2017) and high-altitude regions (Zhang et al., 2014b) that have comparable ranges of temperature, nutrient and light availability. The most abundant bacterial phyla at our sampling sites were Proteobacteria, Bacteroidetes, Chloroflexi, Actinobacteria, Acidobacteria, Planctomycetes, and Verrucomicrobia (see SI text; Figure 1). The dominant archaeal phylum at all sites was Woesearchaeota, similarly to water columns of oligotrophic high-altitude lakes (Ortiz- Alvarez and Casamayor, 2016). Sediments in Skeleton Lake had higher abundances of Chloroflexi, Actinobacteria, Cyanobacteria, and archaeal phyla, while Acidobacteria were more abundant in Lake Hazen sediments. Differences between the lakes might be driven by better light availability at the sediment-water interface, and higher production of sulfide in Skeleton Lake sediments compared to Lake Hazen. Indeed, all coring sites from Lake Hazen had overlying water columns of more than 40 m, measurable dissolved [O 2 ] in the top 1 cm (John's Island samples had >4.7 mgL −1 O 2 down to 5 cm), and low [H 2 S] (<1.2 µM). Furthermore, although toxic, low levels of H 2 S can enhance cyanobacterial photosynthesis when light intensity is low (Klatt et al., 2015). Hence, Cyanobacteria in Skeleton Lake might be able to photosynthesize below the ice cover in the spring.
In sediments sampled in summer 2015, Chloroflexi was the most abundant bacterial phylum in both Skeleton Lake and Pond1 (Figure 2). Their high abundance has been previously observed in hypersaline methane-rich springs in the High Arctic (Lamarche-Gagnon et al., 2015). In the current study, the salinity was low ([Cl − ] < 4.4 mgL −1 ), but Skeleton Lake, and the ponds bordering Lake Hazen are methanogenic (Emmerton et al., 2016). The archaeal communities in Skeleton Lake sediments were dissimilar to those in Pond1. Woesearchaeota were more common in Skeleton Lake, and Euryarchaeota in Pond1 sediments.
Mercury methylation had previously been quantified in both Skeleton Lake and Pond1 (Lehnherr et al., 2012), but its microbial actors were unknown. Fourteen OTUs in our data mapped to mercury methylation in our custom functional mapping database, and their 16S sequences, all matched closely to Methanosphaerula palustris (Cadillo-Quiroz et al., 2009). The genome of the type strain of this species has been shown to possess the hgcAB genes that strongly predict mercury methylation capability (Gilmour et al., 2013). Other taxa most likely also take part in mercury methylation in these sediments. However, our amplicon-based study might have missed their presence because of primer bias and low 16S database coverage of organisms in these environments.
Intra-lake/pond compositional variability could also be high. For instance, the communities at the two sites sampled in Skeleton Lake in spring (Figure 1) and summer 2015 (Figure 2), were strikingly different. Sediments from the deeper site ( Figure S3b), sampled in spring 2015 under ice cover, had a mostly heterotrophic community dominated by Proteobacteria. Meanwhile, sediments from the shallower site ( Figure S3d), sampled in summer 2015, were dominated by phototrophs such as Chloroflexi and Cyanobacteria, and anaerobic fermenters such as Bacteroidetes and Gracilibacteria (Thomas et al., 2011;Wrighton et al., 2012). This indicates high spatial heterogeneity of sediment communities in Skeleton Lake sediments. Primer bias and seasonality of the microbial communities in Skeleton Lake might play a part in this, but the question would require further study. Our qualitative observations of the microbial communities are consistent with (i) measurements of high CH 4 emissions from ponds bordering Lake Hazen (Emmerton et al., 2016), (ii) increased [MeHg] in Skeleton Lake (unpublished data), (iii) high autochthonous carbon, and (iv) nitrogen limitation at the sites (St. Louis, unpublished data).

Physicochemical Data Partially Explains Phylogenetic Variability
The phylogenetic variability may be driven by the unique physicochemical properties of each site, which can vary substantially both in time and in space. A PCA of the physical and geochemical variables in spring 2014/2015 shows that samples group by individual core (Figure 3A). More specifically, the PCA   (Figure 3A). Samples with measurable [H 2 S] had lower redox potential and were from shallower sites (mostly from Skeleton Lake; Table S1). Samples closer to the sediment/water interface had higher pH and [O 2 ]. Our sampling likely captured some of the most FIGURE 2 | Geochemical variability, and microbial community composition of the summer 2015 samples using archaeal and bacterial primers. Abundances of taxa have been merged at the phylum level (Proteobacteria at class level). Phyla with less than 1% overall abundance in each data set are merged. relevant physicochemical variables constraining the microbial community structures. Indeed, the Euclidean distances between the samples calculated from physicochemical variables were correlated with their phylogenetic DPCoA distances (Mantel-R 2 = 0.57, Bonferroni-corrected P < 0.01). The physicochemical variability also correlated with the functionally predicted group abundances (Bray-Curtis distances; Mantel-R 2 = 0.40, Bonferroni-corrected P < 0.01). However, only 25% of the OTUs could be mapped to any function and were thus covered by this analysis.
The PCA on the summer 2015 physicochemical data revealed that these sediments also clustered separately, with Pond1 on one side of PC2 and Skeleton Lake on the other side ( Figure 3B). Most of the differences between these two sites were driven by higher  in sediments from Pond1 and higher [SO 2− 4 ] in sediments from Skeleton Lake, while pH, [O 2 ], and [Cl − ] covaried. However, the top 1 cm surface sediments from Pond1 were highly influential in the PCA because of their higher pH and [O 2 ] than in other samples ( Figure 3B, Table S2). This higher pH and [O 2 ] could reflect the influence of the incoming Lake Hazen waters into Pond1, which tend to be higher in pH and O 2 , especially in the summer under the direct influence of the glacial inflows. It is possible that this difference in the scaling of the sites along the PCA reflects different water sources between Pond1 and Skeleton Lake.
Unlike with the spring 2014/2015 data, the summer 2015 data showed no significant correlations between the physicochemical distances of the samples, and either phylogenetic data or functional predictions (bacterial and archaeal; all Bonferronicorrected P > 0.05). This indicates that the measured physicochemical variability does not explain differences in community structures among samples. Unknown variables, such as redox potential, might be influencing the community assembly at these sites. Furthermore, the two sites in this data set have similar physicochemistry throughout each sediment profile, which probably reduces discriminatory power for this analysis.

Higher Redox Potential and Lower Sulfide Concentration Drive Alpha-Diversity
To identify the drivers of alpha-diversity at Lake Hazen, we fitted random forest models to our data (Touw et al., 2013). Based on Simpson's dominance, diversity in the spring 2014/2015 sediment samples was best predicted by a model including all physicochemical variables (in order of importance: [H 2 S], overlying water depth, redox potential, site, lake, pH, sediment depth, [O 2 ], and year; pseudo-R 2 = 0.72). These results are consistent with our expectations, as H 2 S can be highly toxic to microbial communities (Hoppe et al., 1990;Brouwer and Murphy, 1995). Water depth was the second most important variable explaining alpha-diversity ( Figure 4A). The shallow Lake Hazen sediments at Snowgoose Bay had the highest diversity ( Figure 4B, Figure S1; Table S1), which might be driven by high heterogeneity of the sediments, including steep [H 2 S] and [O 2 ] gradients. The Snowgoose Bay site is also under the direct influence of two glacial river outlets, which might contribute to the heterogeneity through increased delivery of nutrients and inorganic matter. Our observations are consistent with previous findings of the positive relationship between sediment heterogeneity and alpha-diversity ( Lozupone and Knight, 2007). Redox potential, the third most important continuous variable, also had positive relationship with predicted diversity. The effect was similar in magnitude to [H 2 S] and was expected since sulfate reducers are active at low redox potentials. This is consistent with previous studies showing that microbial communities can react quickly to changes in redox potential, changing from aerobic chemoheterotrophy to anaerobic respiration and fermentation (DeAngelis et al., 2010). Finally, we identified pH as a driver of alpha-diversity: since non-extremophilic bacteria need to maintain an optimal intracellular pH of around 7.5 (Booth, 1985), the subsistence of a more diverse community at this pH might be facilitated.
Again, the summer 2015 data set differed from spring 2014/2015 data set, as the best model only included sediment depth and pH as the most important variables for both archaeal (pseudo-R 2 = 0.53) and bacterial data (pseudo-R 2 = 0.32). However, direct comparisons between the spring 2014/2015 data and summer 2015 data sets are difficult, since different sites were sampled, and different geochemical variables were measured. Archaeal and bacterial alpha-diversity in the summer 2015 data set were highest in the deepest sediments, with a discrete increase at the sediment surface (Figures 4C,D). The increase in diversity might be caused by higher diversity of organisms with obligate aerobic (at the surface sediments) or anaerobic metabolisms (at deeper sediments). Unfortunately, no reliable data could be obtained for [H 2 S] or redox potential in these samples because of broken microsensors. Here, pH also seemed to be a factor explaining diversity both for archaea and bacteria, but diversity predictions might be driven only by a few outliers at the extremes of sediment depth.

Communities Cluster Phylogenetically by pH and Display Similar Functional Predictions
To independently support these predictions based on random forests, we performed a tSNE cluster analysis on the spring 2014/2015 samples. These samples clustered mostly by individual sediment core for both full phylogenetic data ( Figure 5A) and for data including only the 25% of OTUs that could be functionally predicted ( Figure 5B). The tSNE analysis of functionally predicted data identified only two clusters, one per lake (Figure 5C). This shows that phylogenetically distinct sediment communities in Lake Hazen have similar functional predictions. Furthermore, Lake Hazen sediments clustered invariably separate from Skeleton Lake sediments in all of these analyses. Random forest classification of the clustering patterns identified pH as the most important predictor for clustering both in the full phylogenetic data set (one predictor; OOB Error = 0%; Figure S11a), and for the functionally mapped OTUs (seven predictors; OOB Error = 12%; Figure S11b). In the full phylogenetic data set, the sediment communities also appear to be more similar to each other over ranges of pH ( Figure 4A, lower panel). [H 2 S] was the most important predictor to explain differences between the clusters in the functionally predicted data (one predictor; OOB Error = 0%; Figure S11c). However, because we sampled only a single site in Skeleton Lake in spring 2015, it remains uncertain if [H 2 S] is the only factor affecting the observed difference in the functionally predicted groups in the sediments of the two lakes. Furthermore, heterogeneity of the communities within Skeleton Lake itself could not be addressed with the clustering analysis due to only a single core being analyzed. Regardless, all the phylogenetically distinct communities in Lake Hazen sediments clustered together after functional prediction. Our results suggest that pH strongly affects phylogenetic community composition in our samples. Indeed, pH has previously been shown to be a major determinant of community composition in similar lake sediments (e.g., Xiong et al., 2012). Sediment microbial communities might be altered in the future because climate change related effects can increase pH in arctic lakes (Kokelj et al., 2005;Mesquita et al., 2010). We observed that the microbial communities in Lake Hazen sediments cored at different sites at different times are phylogenetically distinct from each other and Skeleton Lake sediments. All the samples from Lake Hazen displayed similar functional predictions, while remaining distinct from Skeleton Lake samples. Decoupling between phylogeny and function of microbial communities has previously been observed, e.g., in the global ocean microbial communities (Louca et al., 2016b), and plant-associated environments (Louca et al., 2016a). However, our results rely solely on the analysis of 16S rRNA genes, and therefore lack direct evidence about the actual microbial functioning and activity in the lake sediments. Critical insights could be gained here by employing metagenomics, metatranscriptomics, and (ideally) metaproteomics (Louca et al., 2016a).

Beta-Diversity Is Also Driven by Redox Chemistry and pH
For the spring 2014/2015 bacterial communities, the centroids of the clusters found by NMDS ordinations for both lakes and individual sites were different from each other (Bonferronicorrected P < 0.01), but no year effect could be found (Bonferroni-corrected P > 0.05; Figure 6A). As both spring 2014 and spring 2015 samples were also sequenced using the same primer set, we analyzed samples from different years together. The communities in Skeleton Lake sediments were phylogenetically distinct from Lake Hazen sediments, and the communities in individual Lake Hazen cores were also phylogenetically dissimilar to each other. While these patterns are consistent with the tSNE analysis, they are not as clear because NMDS preserves pairwise distances instead of emphasizing them (like tSNE). [H 2 S], redox potential, and water depth correlated linearly with phylogenetic distances of the communities (Bonferroni-corrected P < 0.05; Figure 6A). Sediment depth was not linearly correlated with the phylogenetic distances, but the communities at the sediment surface might be more similar to each other than communities deeper in the sediment. This can be observed in the grouping of surface samples together in the middle of the ordination ( Figure 6A). The deepest sediments at John's Island appeared quite unique, which might be due to the presence of O 2 all the way down to 5 cm below sediment surface, whereas O 2 is not found at any other sites below 1 cm (Figure 1, Table S1).
Archaeal communities in sediments from Pond1 and Skeleton Lake (summer 2015) also differed from each other phylogenetically (Bonferroni-corrected P < 0.01; Figure 6B). However,  was the only physicochemical variable linearly correlated with phylogenetic differences of the communities in the samples (Bonferroni-corrected P < 0.05). Similar to archaeal communities, bacterial communities in sediments from Pond1 and Skeleton Lake (summer 2015) were phylogenetically significantly different from each other (Bonferroni-corrected P < 0.01; Figure 6C). , pH, sediment depth and [Cl − ] correlated linearly with phylogenetic differences between the samples (Bonferroni-corrected P < 0.05). The communities in surface sediments of both Pond1 and Skeleton Lake seemed most dissimilar to the other samples from the same core.
Altogether, beta-diversity seems to be affected mostly by [H 2 S], redox potential and pH. These are the variables that have surfaced in either the ordination or tSNE analysis for both spring 2014/2015 and summer 2015 data sets. In addition, we also observed trends with water depth in spring 2014/2015 data and  in the summer 2015 data set. The effects of [H 2 S] and redox potential are probably linked to toxicity of H 2 S and different availability of electron acceptors in the changing redox potential, which together alter the community composition. Water depth in the spring 2014/2015 data set can be seen as a proxy for several factors influencing community structure; the depth of the overlying water column influences both light availability and sediment dynamics, such as differences in sedimentation rate and nutrient inputs, resuspension, and sediment focusing. However, the trends in summer 2015 data with  are questionable, as (i) [NO − 3 ] covaries with sulfate and sediment depth (deeper sediment horizons have lower nitrate and higher sulfate; Figure 3B; Table S2), and (ii)   in Pond1 is much higher than in Skeleton Lake (Table S2).

Taxonomic Group Abundances Vary Along Physicochemical Gradients
We conducted random forest analyses to discover relationships between physicochemical gradients and abundances of taxonomic groups, and our functional predictions (see SI text; Tables 1, 2; Figures S15-S37). We found an association between increasing levels of [H 2 S] and (i) decreasing abundances of aerobic taxa and functionally predicted aerobic groups (putative aerobic ammonia oxidizers, aerobic chemoheterotrophs, aerobic nitrite oxidizers, and predatory/exoparasitic microbes), and (ii) increasing abundances of functionally predicted sulfate  respirers, methanogens, and cyanobacteria (cyanobacteria are all photosynthetic and thus mapped to a single group; Figure S15). Skeleton Lake sediments had much higher [H 2 S] than Lake Hazen sediments, but the community differences linked to [H 2 S] are not completely explained by differences between the lakes ( Figure S34). [H 2 S] seems to affect both phylogenetic and functionally predicted community composition, and climate change has previously been thought to result in increased accumulation of sulfur in high arctic lake sediments (Drevnick et al., 2010). Chemical weathering of sulfate containing minerals (e.g., gypsum-CaSO 4 ) following glacial melt and/or permafrost thaw could also increase delivery of SO 4 to waterbodies in the Lake Hazen watershed. Enhanced rates of sulfur cycling in sediments might change the community structure, which might affect other geochemical cycles mediated by the sediment communities. Taxonomic groups that increased in abundance with increasing redox potential were aerobic chemoheterotrophs, such as Acidobacteria (Ward et al., 2009), and obligate aerobic methylotrophic Betaproteobacteria (Chistoserdova and Lidstrom, 2013; Figure S18a). In addition, the functionally predicted group of methanol oxidizers increased in abundance with increasing redox, which suggests that these organisms are aerobic (Jenkins et al., 1987). However, putative sulfur reducers also showed a positive relationship with redox, which was a surprising result. Most of the taxa mapped with FAPROTAX to this functional group belong to the uncultured genus Desulfurellaceae H16, which has been previously detected in anaerobic bioreactors (Wei et al., 2017). Bacteria from the family Desulfurellaceae are typically strict anaerobic sulfur-reducers (Greene, 2014;Florentino et al., 2017), but here seem to be abundant at sites with high redox potential (>400 mV) and in the presence of oxygen (>4 mgL-1). To the best of our knowledge, this has not been observed in previous studies.
In the current study, we identified pH as an important driver of the sediment microbial community structure and diversity, similarly to previous studies (see SI text;Xiong et al., 2012). Random forest analysis showed that the relationships of taxonomic groups to variation in pH were mostly supportive of previous observations in lake sediments (see SI text; Figures S16, S27a; Xiong et al., 2012). We also detected an increased abundance of Cyanobacteria at higher pH ( Figures  S16, S27), which is in accordance to the generation of alkaline conditions via autotrophic pathways. Similar relationships between pH and Cyanobacteria in the High Arctic have been previously observed in lake microbial mats (Lionard et al., 2012). We also observed a higher abundance of functionally predicted sulfate respirers and methanogens at lower pH. This is in accordance with lower pH optimums of these processes (Ferry, 1993;Hao et al., 1996), than the average pH of 7-8 in our samples.
Finally, results from the random forest analysis showed that abundances of predicted fermenters and intracellular parasites (most of these are known as Amoebae-Resistant Microbes; Greub and Raoult, 2004) increase with water depth ( Figure S20b). The OTUs identified in our analysis included representatives of, e.g., phylum Chlamydiae (Lory, 2014), and orders Legionnellales (Garrity et al., 2015) and Rickettsiales (Renvoisé et al., 2011). The presence of obligate intracellular parasites indicates a higher abundance of grazing protists, and in the case of Rickettsiales, of arthropods (Renvoisé et al., 2011) at the deeper sites. These organisms might together with fermenting microbes contribute to increased cycling of organic matter and transfer of energy to higher trophic levels (Lei et al., 2014). The increased abundance of microbes involved in organic matter cycling suggests increased delivery of material to the deep basin (i.e., sediment focusing) in Lake Hazen. Furthermore, the longer duration of ice-free periods (Latifovic and Pouliot, 2007;Surdu et al., 2016) and increased runoff (Bliss et al., 2014) seem to have already increased the sediment, carbon, and nutrient inputs to Lake Hazen (Lehnherr et al., 2018).

CONCLUSIONS
Despite extreme conditions in the High Arctic, our results show that lake sediments from this area harbor highly diverse microbial communities that vary both in time and space, but that are mainly shaped by redox and pH. Although the microbial communities in cores sampled at the three sites in Lake Hazen were phylogenetically distinct, they were functionally predicted to exhibit similarities. However, such functional predictions need now to be validated with metagenomics or metatranscriptomics studies, especially when performed on undersampled and extreme environments such as Lake Hazen.
The way such extreme environments will behave in the context of climate change is unclear. On the one hand, the predicted functional similarity of the communities in the backdrop of spatiotemporal microbial heterogeneity could be interpreted as a sign of resilience. However, as rising temperatures have both direct and indirect influences on redox chemistry and pH, the main drivers of microbial communities identified herein, it is very plausible that the current community structure could be disrupted under the climate regime predicted for the Arctic. Future work on Arctic lake sediments should focus on elucidating the functioning of the communities, and long-term studies performed throughout the seasonal regime shifts. As these seasonal shifts drive the redox chemistry, light and nutrient availability in the lakes, they might also affect the structure of microbial communities within.

DATA AVAILABILITY
All analysis scripts generated for this study can be found in the GitHub repository (https://github.com/Begia/Hazen16S), sequencing data is deposited in the NCBI Sequence Read Archive (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA430127), and the geochemical data is deposited in the the NOAA National Centers for Environmental Information database (http:// accession.nodc.noaa.gov/0171496).

AUTHOR CONTRIBUTIONS
MR, AP and VS designed the experiments. MR, KS and VS performed the sampling. KS and VS produced the physicochemical data. MR extracted DNA, analyzed the data and prepared the manuscript, with extensive contributions from all coauthors. SA-B and AP supervised the project.

FUNDING
This work was funded by Natural Resources Canada/Polar