Upstream land use with microbial downstream consequences: Iron and humic substances link to Legionella spp.

Intensified land use can disturb water quality, potentially increasing the abundance of bacterial pathogens, threatening public access to clean water. This threat involves both direct contamination of faecal bacteria as well as indirect factors, such as disturbed water chemistry and microbiota, which can lead to contamination. While direct contamination has been well described, the impact of indirect factors is less explored, despite the potential of severe downstream consequences on water supply. To assess direct and indirect downstream effects of buildings, farms, pastures and fields on potential water sources, we studied five Swedish lakes and their inflows. We analysed a total of 160 samples in a gradient of anthropogenic activity spanning four time points, including faecal and water-quality indicators. Through species distribution modelling, Random Forest and network analysis using 16S rRNA amplicon sequencing data, our findings highlight that land use indirectly impacts lakes via inflows. Land use impacted approximately one third of inflow microbiota taxa, in turn impacting ~20 – 50 % of lake taxa. Indirect effects via inflows were also suggested by causal links between e.g. water colour and lake bacterial taxa, where this influenced the abundance of several freshwater bacteria, such as Polynucleobacter and Limnohabitans . However, it was not possible to identify direct effects on the lakes based on analysis of physi- ochemical-or microbial parameters. To avoid potential downstream consequences on water supply, it is thus important to consider possible indirect effects from upstream land use and inflows, even when no direct effects can be observed on lakes. Legionella (a genus containing bacterial pathogens) illustrated potential consequences, since the genus was particularly abundant in inflows and was shown to increase by the presence of pastures, fields, and farms. The approach presented here could be used to assess the suitability of lakes as alternative raw water sources or help to mitigate contaminations in important water catchments. Continued broad investigations of stressors on the microbial network can identify indirect effects, avoid enrichment of pathogens, and help secure water accessibility.


Introduction
Land use and climate change can induce imbalances in the microbial community that threaten reliable access to clean water, since many bacterial pathogens are opportunistic and naturally occurring.The availability of clean water is crucial for all societies and intrinsically linked to human health.Access to clean water has its own sustainable development goal (UN General Assembly, 2015) and is a central global Abbreviations: ASV, amplicon sequence variant; CODMN, chemical oxygen demand; DAG, directed acyclic graphs; GLMM, generalized linear mixed models; RF, random forest.societal risk factor (World Economic Forum, 2023).Increasing population growth and climate change are pressuring ecosystems that have good water quality (Vörösmarty et al., 2000).Stressors from land use (e. g., farms, fields, buildings, and pasture) and climate change (e.g., humification caused by increased precipitation) can lead to increasing nutrient loads, especially in boreal lake systems used as water sources for many communities in temperate regions.Importantly, bacteria not only cause problems but are also essential to functioning ecosystems, where a balanced network of microbes provide ecosystem services such as degradation of organic matter.Stressors can lead to ecosystem imbalances and consequences for water quality due to changes in the microbial community.Humans and animals can then be exposed to the microbes when the water is used recreationally or after treatment for drinking water.Water quality reduction is often associated with increases in turbidity, colour, or faecal bacteria.In Sweden, risks to reliable clean water access are mainly associated with disease caused by faecal bacteria like Campylobacter, pathogenic E. coli and Shigella (Public Health Agency of Sweden, 2015), but also include environmental pathogens that can cause disease such as diarrhoea (e.g.Vibrio) or pulmonary infection when inhaled such as several species of e.g.Legionella (World Health Organization, 2011).
Waterborne pathogens can be part of the natural microbial community and are often environmental opportunists (Anttila et al., 2015).Legionella is such an opportunistic bacteria that has its natural life cycle in water and soil as part of biofilms and intracellular parasites of protozoans, sporadically causing problems in water supplies (Hamilton et al., 2018;Leoni et al., 2018;Holsinger et al., 2022).Legionella is a public-health-relevant genus, with over 20 of 71 known species being potential human pathogens and cases of disease have been coupled to local watershed hydrology (Ng et al., 2008).Although Legionella abundance has been shown to increase near agricultural sites and to vary depending on land use type in Canada (Peabody et al., 2017), there is still a knowledge gap regarding what selects for these species in environmental water (Eriksson et al., 2022) and how they colonize human-made water systems.
Studies of threats to water quality often focus on separate factors, such as the composition of microorganisms in raw water (Comté et al. 2017;Brindefalk et al. 2022), riverine inflows (Wu et al. 2022), land use effects (Schelker et al. 2016), and nutrient regimes (Soranno et al. 2015;Yang and Kim, 2019).For example, Brindefalk et al. (2022) investigated water quality of six important raw water sources in Sweden across a gradient of land-use factors as proportion of farmland and urban area at the respective water catchments, and found associations of bacterial metacommunities with water quality indicators and degree of urbanisation in the catchment area.Soranno et al. (2015) investigated the effects of spatial extent for measuring land use, hydrologic connectivity, and the regional differences on the amount of nutrients in 346 lakes in the USA, where the strongest association was found to be due to lake hydrologic connectivity.However, studying these factors in isolation hinders identification of the complex systems governing water quality, since these parameters may display effects only when analysed jointly, not individually.This includes factors influencing the lake inflows as well as factors influencing the lakes themselves.Stressors on the microbial network are fundamental, so a broader approach could help in securing water accessibility (Fig. 1a).
In this study of five Swedish lakes, together with their inflows and surrounding land use, we addressed these questions (Fig. 1b-c): I) Does land use influence lake inflows?II) Do upstream inflows influence the lakes?III) Does the disturbance due to land use/anthropogenic action have a measurable effect on the lakes?IV) Does land use affect the occurrence of the genus Legionella (exemplifying how upstream land use could affect downstream water quality)?

Parameters of study waters
Five lakes (Supplementary Fig. 1, coded as By, Ho, Ok, Sk, and Sv) with potential for surface raw water use were sampled in southern Sweden, in an area with mixed,but generally low, levels of human disturbance, totalling 160 samples.The samples were taken in October 2018, November 2018, May 2019, and August 2019, capturing the full yearly variability.On each sampling occasion, water-samples from inflows to the lake (referred to as "inflow" samples), shore lake water and a mid lake sampling point were collected (both referred to as "lake" samples).Inflows relevant to sampling based on land use were identified using available maps (e.g., WaterInformationSystem Sweden [WISS]) and aerial imagery, selecting streams assumed to contribute to lake contamination, i.e., passing through or near agricultural land or buildings.To identify anthropogenic influences, all lake inflows were screened for the proximity of farms, fields, buildings, and pasture (Table 1, visualized for one lake in Fig. 1b).
In addition to the condensed materials and methods below, in depth details concerning indicator analysis, sample preparation methods for DNA extraction, sequencing, qPCR, and the GLMM of bacterial ASVs and Legionella specific ASVs can be found in the Supplementary Material (Supplementary Fig. 1-13).

Sample collection
Lake samples were collected from the shore by lowering the sample bottles under the water surface and by boat at depths of either 5 or 10 m using a Ruttner water sampler (Hydro-X 1.7 L; Swedaq, Höör, Sweden), after which water was distributed into sampling bottles.Samples from the inflows to the lakes were collected from land, downstream from potential contamination points near the lakeside, by lowering the sample bottles under the water surface (Fig. 1a, Supplementary Fig. 1).Disposable gloves were used during all sampling to avoid sample contamination.

Chemical analysis
We investigated a wide range of water property indicators covering effects of disturbed inflowing water with respect to land-use effects.Chemical analysis of the collected samples was performed by Eurofins.All samples, except those collected at the potential point for withdrawal of raw water in November 2018 and August 2019, were analysed using the PSL2V package for drinking water chemical analysis (supplied by Eurofins Environment Testing Sweden AB).Samples collected at the potential point for withdrawal of raw water in November 2018 and August 2019 were subjected to an extended analysis using the PSL3Z package that, in addition to physical chemical parameters and elements, also includes pesticides.

DNA sequencing and classification, molecular analyses
For the full set of samples Illumina sequencing was performed on the V3-V4 16S rRNA gene region using primers Bact_341F and Bact_785R.For a subset of samples full-length sequencing was done on the full 16S rRNA gene using Oxford Nanopore technology using primers Bact_27F and Bact_1492R.Resulting sequences were subjected to standard quality control and trimming and classified against the SILVA no.99 v. 138 dataset for V3-V4 sequences and a database based on GTDB version Release 08-RS214 for the full length sequences.
For phylogenetic placement of sequences assigned to the genus Legionella a reference dataset consisting of 17 Aquicella sequences (as outgroup) and 393 Legionella sequences was assembled and phylogenetic relationships were inferred using a maximum likelihood approach (IQTREE v. 2.0.7,Nguyen et al., 2015).For the detection of L. Land use can affect the microbial community and water chemistry.Also, the microbial community and water chemistry can affect each other.This study therefore focuses on a holistic approach examining the potential effects of upstream land use on water quality.Panel b: Sampling design and land use categories for one of the five lakes.For each of the four sampling occasions, stream inflows (numbered 1-8) and lake water including shore water (both referred to as "Lake") was sampled.Panel c: Hypotheses and analyses for the study of the effect of upstream land use on the water quality.I: Land use does have an influence on the water properties; II: The upstream inflows do have an effect on the overall water properties and the composition of the microbiota in the lakes; III: The degree of disturbance of water quality due to land use/anthropogenic action does affect the inflows and has a measurable effect on the lakes; IV: Land use can affect the occurrence of genus Legionella, a potentially pathogenic organism that could cause problems in raw water, demonstrating how upstream land use could impact downstream water quality.
pneumophila by qPCR, the specific mip gene target was used (Nazarian et al., 2008).

Generalized linear mixed effect modelling
For questions I: land use effects on lake inflows, III: disturbance of land use on the lakes, and IV: land use effects on the occurrence of the genus Legionella, the following generalized linear mixed model (GLMMs) were used to estimate land-use effects on the water quality of the inflows and lakes: where g() is the log link function defining the mean of the linear function of predictors, m ij is the jth ASV abundance in sample i (i = 1, …, n tot ) with n tot being the total number of samples included (n tot = 115 for inflow analysis, and n tot = 63 for physiochemical analysis), α 0i is the effect of the total sample sequence abundance of sample i, β 1ik is the effect of time point k when collecting sample i on ASV j (k = 1, …, 4), β 2il is the effect of sampling year l on ASV j (l = 0, 1, corresponding to 2017 or 2018), β 3jn is the effect of the lake system (level 3 in the model hierarchy) for sample i on ASV j (n = 1, …, 5), β 4jno is the effect of sampling point o in lake system n (level 2 in the model hierarchy) for sample i on ASV j (n = 1, …, 5, with m differing depending on the lake), β 5jp is the effect of pasture near the sampling location i on the abundance of ASV j (p = 0, 1), β 6jq is the stream size (q = 1, 2, or 3 corresponding to small, medium, or large), β 7jr is the effect of farm presence on ASV j (r = 0, 1), β 8js is the effect of building presence on ASV j (s = 0, 1), and β 9jt is the effect of agricultural field presence in the surroundings on ASV j (t = 0, 1).
The regression models were implemented and analysed using the RStan library (Carpenter et al., 2017) within the R statistical software (R Core Team, 2021).Eight parallel chains were used to collect posterior draws with a total of 2000 iterations per chain with the first 1000 iterations removed (as burn-in), providing 8000 iterations in total.Results were visualized using R package ggplot2 (Wickham, 2016).R code and data for reproducing the results are available at www.github.com/jonhar97/LandUseEffects.

Random forest: source tracking
To find the degree of inflow source contribution to the lake samples (i.e., question II), the random forest (RF) classification method implemented in the RandomForest R package was used (Liaw and Wiener, 2002).The source environments (i.e., training data) were defined as: inflows (n = 115 and 39 samples for ASV and physiochemical signatures, respectively) and lakes (n = 32 and 24, respectively).The bacterial signatures consisted of the abundance data for the same top 100 ASVs as in the GLMM modelling.The 21 variables forming the physiochemical signature were first log transformed, apart from chloride, sodium, and potassium, which already displayed good distributional properties.Both sets of signatures were then scaled to zero mean and unit variance.RF models were first trained on the training data via a grid search of parameters to minimize the out-of-bag prediction error: ntree (i.e., number of trees in the forest) = 250, 500, 1000, and 1500; mtry (i.e., number of candidates drawn to feed the algorithm) = 2, 3, 4, and 5; and minimum node size (i.e., depth of trees) = 1, 2, 10, and 20.After model optimization, the source label was removed for one of the lake samples (i.e., leave-one-out cross-validation) and the mixing proportions of the source environment were predicted for all lake samples.

Directed acyclic graphs (DAG)
To investigate if land use affect the lake water quality (i.e.question III), directed acyclic graphs (DAGs) were used to predict causal links between bacteria amplicon sequence variants (ASVs) affected by land-

Table 1
Sampling points included in the statistical analyses, using representatives from each lake and chemical data (Supplementary Fig. 4) for the four sampling occasions.These points also represent mixed levels of land use (Fields, Buildings, Farms and Pastures), including "undisturbed" inflows (all 0).
use factors and water property variables, to capture possible indirect effects.Dynamic Bayesian Networks were used to model dependencies between land use-associated bacterial taxa and physiochemical variables.The inferred topology (i.e.relationships between nodes: here, ASVs and physiochemical variables) can be interpreted as chain effects of many dependent variables which might not be detected if analyzed separately.To infer causal relationships between nodes and not confound the results with design effects, we first corrected the bacteria abundance and physiochemical variables for design factors (i.e., removing effects of lake system, sampling year, and season), by using the GLMM outlined above.In this way, we could improve on earlier applications of DAG to microbial data.
First, the data were normalized such that the counts of all samples were equal in the data (to compensate for differences in the total number of reads between samples).The network was created by collapsing the V3-V4 16S rRNA ASV sequence data to the highest taxonomic resolution possible (genus).Selected environmental variables were included to infer dependencies (edges) between bacteria and environmental factors (nodes), and thus potential drivers of community structure.To avoid including highly correlated variables as predictors in the model, a preselection step was performed based on the PCA score plot of the environmental variables (i.e., only one of a group of tightly clustered variables was selected).Then, all features (i.e., bacteria and environmental variables) were normalized such that the mean value was centred at zero Fig. 2. Effects of land-use on bacteria.Panel a: significant GLMM predictions of the estimated effects of land-use predictors on ASV abundance.Non-zero inclusion in estimated credible intervalsinterval were defined as significant.The bars highlight 2.5 % and 97.5 % of the posterior density, while the posterior mean is indicated with a dot.Colours highlight taxonomic assignment at the genus level.Panel b: effect of land use on physiochemical variables in the inflows for 39 samples for which extended chemical data were available.The bars show 5 % and 95 % of the posterior density, while the posterior mean is shown with a dot.and the variation was transformed to one to enable comparison across datasets.
The network was inferred using BANJO version 2.2.0 (http://www.cs.duke.edu/~amink/software/banjo/,Smith et al., 2002) using the following analysis conditions: discretization = 2 intervals; max parents = 3; min lag = 0; max lag = 0; and the Simulated Annealing search algorithm.These parameter choices were based on preliminary runs minimizing the negative log-likelihood score (i.e., resulting in best goodness of fit to the data).For other settings, default parameters were used.

The bacterial composition of lake inflows was influenced by land use
The bacterial composition of the lake inflows was influenced by the presence of buildings and agricultural fields, where many common ASVs (of the top 100) increased in relative abundance due to anthropogenic action (Fig. 2a).To identify anthropogenic influences on bacterial communities and physiochemical signatures, all lake inflows were screened for anthropogenic factors (by presence/absence) near the sampling point: farms (small scale crop production and animal husbandry), fields (under active cultivation), buildings (single-family homes and farm outbuildings), and pasture (horses, goats, cows) (Fig. 1b, Table 1).To infer the associations between ASVs and anthropogenic factors, a GLMM was fitted to the abundance data of the 115 inflow samples.Of the top 100 most abundant ASVs, 27 were associated with one or more of the land-use categories (Fig. 2a, Table 1), with both buildings (n taxa = 9) and agricultural fields (n taxa = 17) having the highest number of associations, while farms (n taxa = 3) and pasture (n taxa = 0) were less influential.For these 27 ASVs, most associations (20/29) were positive, indicating higher frequency (in relative composition), compared to undisturbed environments, under the influence of anthropogenic factors.The most common bacterial order among the associated set of ASVs was Burkholderiales, represented by the genera Limnohabitans, Variovorax, Undibacterium, Polynucleobacter, Rhodoferax, and Herminiimonas (Supplementary Table 2).In addition, we investigated the impact of land use on bacterial alpha diversity: the presence of land use factors resulted in a predicted increase in the species diversity in most cases (Supplementary Fig. 3).The model predicts that farm presence would increase the Fisher alpha diversity by approximately 35 % in the inflows, all else being equal.Both bacterial composition and diversity thus displayed similar patterns, with the presence of farms, fields, and buildings producing a positive response.None of the classic faecal indicators (coliforms, Escherichia coli, enterococci and Clostridium perfringens) were positively influenced by land use factors.

The variable and diverse bacteria in the inflows influenced the composition in the lakes
To examine how much influence the bacterial composition from the inflow had on the lake water composition, we first performed alpha-and beta-diversity analyses for the taxa variation.The inflows displayed higher diversity than did the lakes (Figs.3b, 3d).The variability in the inflows was higher for both the bacterial composition and most physiochemical signatures, such as iron, colour value, and turbidity, throughout the study period.Furthermore, the alpha diversity (Supplementary Fig. 5) was higher in the inflows than in the lakes (Welch Fig. 4. Predicted source environment for physiochemisty and bacteria.Random forest (RF) analysis of the predicted source of each feature (physiochemical signatures and V3-V4 16S rRNA ASVs).Panel a: inferred source proportion of the ASV profile data for each lake.Panel b: inferred source proportion of the physiochemical profile data for each lake.Panel c: variable importance score (i.e.mean decrease in accuracy if the variable is left out) for each of the top 15 bacterial ASVs highlighted at the genus level for discriminating the inflow and lake sources.Panel d: variable importance score for discriminating lake and inflow sources using the top 15 physiochemical variables.
Two Sample t-test, t = 4.61, p-value = 1.76e-05).Most ASVs found in the lakes were also found in the inflows, while the inflows had a large observed proportion of unique ASVs (Supplementary Fig. 6).The lakes had higher relative abundances of Actinobacteria and Planctomycetota (Figs. 2a and 2c), while the inflows had higher relative abundances of Proteobacteria and Bacteroidota.The inflow communities contained a higher number of taxa, which were more variable (PERMANOVA; f = 8.84, p-value = 0.001, PERMDISP; f = 60.2, p-value = 1.42e-12) and diverse than in the lake communities.
To better understand how much the inflow influenced the lake water with respect to the bacterial and physiochemical compositions, we used the random forest (RF) classification method to model whether the bacteria (115 samples) and water property (39 samples) signatures in the lakes originated from the inflows.In general, most of the data were assigned to the lake environment (Fig. 4a-b), with some variation: the mean proportion of inflow in the lakes was 0.26 (0.17) for ASVs and 0.23 (0.17) for the physiochemistry, with standard deviations within parentheses.In some cases, lake samples consisted of more than 50 % inflow signature: one sample each for ASVs in lakes Ho and Sv, and one sample each for physiochemistry in By and Ho.Some bacterial taxa and physiochemical variables were especially influential on the water composition of the lakes (Fig. 4c-d).For example, the occurrence of ASV31 from the hgcl clade was almost four times higher in relative abundance in lake communities (mean frequency: µ Lake,asv31 = 0.031 (0.023), µ Inflow,asv31 = 0.008 (0.066)).One ASV from the hgcl clade was preferentially found in the inflow environment, as was the CL500-29 marine clade.In the model, pH and alkalinity had the highest discriminating power (Fig. 4d).Both pH and alkalinity levels were elevated in the inflows: µ Lake,pH = 7.08 (0.13), µ Inflow,pH = 6.35 (0.78), and µ Lake,Alk = 8.33 (2.25), µ Inflow,Alk = 7.53 (11.00).To summarize, some ASVs and physiochemical variables that were associated with the inflows considerably influenced the water properties in the lakes.

Downstream indirect influence of upstream characteristics on composition of microbiota
To investigate the land-use effects in the watersheds, we fitted a GLMM to the lake bacterial communities (n = 35).For the analysis, the top 100 ASVs were selected according to their overall frequency in the lake assemblages.No significant effects (the predicted distributions overlap zero) on the lake ASVs were obtained for the proportion of fields or forest in the watershed, the lake waterline length, or the water body volume (a proxy for the sensitivity to perturbations) (Supplementary Table 5) on the lake ASVs were obtained (Supplementary Fig. 7).
For the holistic understanding of possible indirect downstream effects, DAG analysis was used to predict causal links between land use affected ASVs and water property variables.Since land use can affect the microbial community and water chemistry, and the microbial community and water chemistry can affect each other, indirect effects of land use could still be possible (Fig. 1a), even though significant effects were not identified by the GLMM on the physiochemical data in inflow samples.The land use-associated bacterial taxa (Fig. 2a) and physiochemical variables (Supplementary Fig. 4, 9) were merged to construct a DAG network (Fig. 5), resulting in physiochemical variables situated at the top of the network, linked to each other.Of these, aluminium, colour value, and turbidity were central in the topology: the upper position in the network topology suggested that these variables influenced the ASVs below.Aluminium and colour value had several links with negative influence scores, while the overall network contained many positive links (Fig. 5, Supplementary Table 4).A limited number of ASVs occupied central parts of the network, mainly Proteobacteria comprising a large number of gram-negative bacterial genera.Of these, Polynucleobacter, Herminiimonas, Rhodoferax, Variovorax, and Limnohabitans were central in the network.In the phylum Bacteroidota, Sediminibacterium had several nodes, while most Actinobacteria, such as Planktoluna and the hgci clade, had fewer nodes and were found further down the topology.To summarize, the DAG analysis provides an understanding of the effects of various factors on the microbiota; although a singular factor may not capture the complete effects of land use disturbance on water quality, indirect effects of e.g.changes in colour value may be revealed when whole systems are analysed, due to the complex interaction of variables such changes can cause ripple effects down in the network.

Occurrence of legionella
Of the bacterial genera identified as transmitting waterborne disease on the World Health Organization (2011) list, only the genus Legionella displayed high presence and diversity in the studied lake systems (584 ASVs: Fig. 6, 7, Supplementary Fig. 10), enabling a focused case study of this bacteria that has possible public health implications.Legionella had more ASVs than any of the 10 most abundant genera in the dataset (Supplementary Fig. 11), e.g.compared to 36 ASVs of Mycobacterium.The alpha diversity and relative abundance of Legionella were significantly higher in the watershed inflows (n = 115) than the lakes (n = 35) (Wilcoxon rank sum test W = 1440, p-value = 0.0355) (Fig. 6d).To better understand the ecology of this diverse genus, we performed a taxonomic assessment and quantification as well as a deeper analysis to find links between the occurrence of Legionella and land use and/or environmental factors.
A general qPCR for Legionella spp. was positive in all four of the tested samples, selected to represent a mix of low and high abundance of the taxa.The highest concentration was found in the Ho inflow with ~4668 bacteria/L (Fig. 6a).The presence of ASVs classified as L. pneumophila were detected with full-length 16S analysis (Figs. 6b-c, Supplementary Table 3).However, results for the presence of the mip gene in the four samples were negative (data not shown), suggesting that the detected species does not belong to the virulent L. pneumophila taxa.
Three Legionella ASVs were present in more than eight samples in total and were included in GLMM analyses of anthropogenic factors (in inflow samples, n = 115) and in samples with available physiochemical data (n = 63), resulting in a strong inferred association with the presence of farms, pasture, and fields (Supplementary Fig. 12).In addition, the ASVs were positively associated with colour value (β ASV5659 = 2.90, 95 % CI [0.39, 5.89]) and aluminium content in the samples (β ASV4077 = 6.85, [1.68, 13.63]).

Prediction of Legionella in response to changed water chemistry
In future scenarios of climate change and anthropogenic activities, it is expected that human activities and their impacts on water quality will increase.Therefore, we performed simulations using the inferred model above: three Legionella ASVs presence were simulated in the case of a 50 % increase in colour values and aluminium levels, but also with the same levels as measured (i.e. the baseline).The average detection frequency of ASV4077 more than doubled across samples with a 50 % increase in aluminium at 0.193 (0.248) compared to the baseline at 0.092 (0.199) (Supplementary Fig. 13).A 50 % increase in colour value resulted in an average detection frequency of ASV5659 of 0.150 (0.193) compared to the background of 0.094 (0.158).The last ASV2280 increased in detection frequency as well, but not as pronounced: 0.093 (0.122) and 0.080 (0.111) for 50 % increase in aluminium and colour value respectively as compared to the background of 0.066 (0.101).It is thus likely that variants of Legionella will increase in presence with intensified land use.

Discussion
Upstream land use can have downstream consequences for the water chemistry and microbial community in inflows and lakes.Traditional approaches often compartmentalise factors influencing water quality, such as microorganisms, inflows, land use effects, and water chemistry, Fig. 5. Causal downstream effects from land use characteristics.Direct acyclic graph (DAG) analysis (using BANJO) of environmental data and V3-V4 16S rRNA ASVs (naming: GenusASVnumber).Colours denote environmental parameters or bacterial phyla.Red arrows denote negative links.
hindering the identification of intricate causal links.To address this limitation, our study comprehensively investigated five lakes, their inflows, and surrounding land use, emphasising that: I) land use influenced both bacteria and chemistry of lake inflows; II) upstream inflow impacted lake compositions; III) the lake microbiome can be indirectly influenced by land use via e.g.changes in colour value and; IV) upstream land use can have downstream consequences relevant for water supply, illustrated by Legionella (a genus containing pathogens), that increased by the presence of pastures, fields, and farms.
Therefore, land use can either directly or indirectly threaten public  access to clean water, since many environmental and opportunistic bacterial pathogens could increase in occurrence.Threats to public access to clean water could include wide-scale alteration of the microbiota and the transmission of potential pathogens to surface raw water, sources providing water for consumption, agriculture and recreational lakes.

Bacteria as sentinels of land use changes in lake inflows
We found that a significant portion (approximately one-third) of the inflow microbiota responded to land use, demonstrating the role of bacteria as sensitive indicators of environmental change.The observed shifts in bacterial communities, despite weak evidence for corresponding alterations in water chemistry, can be attributed to several key factors.First, bacteria exhibit remarkable sensitivity to changes in their surroundings, allowing them to serve as early indicators of ecological shifts (Faust et al., 2015).Additionally, the dynamic nature of microbial communities, influenced by factors such as functional redundancy and metabolic plasticity (Comte et al., 2013), may contribute to fluctuations in bacterial composition even when traditional chemical parameters remain relatively stable.Even though there was weak evidence for large alterations in water chemistry, it is important to consider the discrepancy in number of measurements between bacteria and chemistry (115 and 39, respectively).Since there were indications of land use effects on the water chemistry (Fig. 2b), we speculate that a more distinct impact could have emerged with a revised model design, including increased chemical measurements in the inflows.Still, the temporal dynamics of bacterial responses and their rapid adaptation to alterations in land use provide a more nuanced insight into environmental stressors (Hermans et al., 2020), offering a comprehensive perspective on ecosystem health that extends beyond conventional chemical assessments.
Conventional chemical assessments include measuring humic substances or the colour value of water, to understand the extent of organic matter and its potential influence on aquatic life.Humic substances leach from soils into waters (Susic, 2016), where they can be consumed by bacteria in an oxygen-demanding process (Farjalla et al., 2009).Runoff from fertilizers and manure can also increase the carbon content, contributing to brownification (Kritzberg et al., 2020).Iron and aluminium easily bind to humics (Zhou et al., 2015) and are transported with these substances (Herzog et al., 2020).Earlier studies have found that land use such as forest plantations can increase both the iron and humic substance contents in freshwater ( Škerlep et al., 2022).In this study, the inflows often contained browner water compared to the lakes, but a connection to land use could not be statistically determined.However, changes in land use may introduce new organic substrates or alter nutrient availability, influencing the types of bacteria that thrive without necessarily altering overall water chemistry in terms of coloration of the water.Consequently, land use-induced alterations, such as changes in humic substances, can disrupt the microbial community, illustrating how disturbances initiated upstream can travel down the stream.

Land use can indirectly affect the downstream bacterial community in lakes, via inflows
We found both that land use impacted approximately one third of the analyzed inflow microbial taxa, and that inflow associated taxa were detected in the lakes up to a total frequency of ~10-50 %.Land use could thus lead to indirect effects on a balanced microbiome due to the characteristics of the affected inflows (Fig. 4), which have an impact on the bacteria (Fig. 5).The bacterial compositions were more diverse in the inflows than in the lakes (Fig. 3), which could be explained by a more variable water chemistry in combination with the inflow of terrestrial taxa, as well as by land use effects.Earlier studies have shown that land use can affect bacterial communities (Kraemer et al., 2020).To understand how the upstream community can affect the downstream, we performed a RF analysis.In this analysis, the communities in the inflows showed varying impacts on the lake communities, and on average about 20 % of the most abundant taxa were predicted to originate from inflows (Fig. 4a), an impact that we hypothesize could increase with increased precipitation.The same was found for the water chemistry, where e.g.colour value as well as iron and aluminium contents were generally higher in the inflows, impacting the lakes.These downstream propagations demonstrate the indirect ways land use can affect aquatic microbial ecosystems.
Indirect effects via inflows were also suggested by causal links between e.g.water colour and lake bacterial taxa, where this influenced the abundance of several freshwater bacteria, such as Polynucleobacter and Limnohabitans.In the network analysis, aluminium and colour value that were characteristics of the inflows had causal links with lake bacteria.In addition to involvement in phosphorous limitation, iron and aluminium can promote the formation of biofilms (Hu et al., 2019;Kappler et al., 2021).Changes in humic substances, iron, and aluminium can thus select for bacteria adapted to these conditions.Several bacteria in the DAG network (e.g., Sediminibacterium, Herminiimonas, Rhodoferax, and Polynucleobacter) have been described as mainly providing ecosystem services, such as denitrification, phosphorous release, and organic matter degradation (Watanabe et al., 2012;Logue et al., 2016;Mandel et al., 2019).The hgcl clade and the CL500-29 marine clade were also identified as important for the inflow (Fig. 4c), and both these taxa have been shown to be involved in the carbon cycle and can use different forms of carbon-based compounds (Zhang et al., 2020).Although bacteria commonly break down organic matter, changes in the origin of this organic matter can lead to indirect effects on the community, since humic substances have varying nutrient contents and can be hard to degrade (Moran and Hodson, 1990).Thus, inflows contribute in shaping the microbial community in lakes, since their water properties can contribute in forming new niches, leading to the enrichment of other taxa, including potential pathogens.

Land use can affect Legionella via iron, aluminium, and colour value
In the context of water supply significance, our investigation reveals a noteworthy relationship between upstream land use and Legionella.Legionella taxa were more abundant in watershed inflows than in lakes, in accordance with previous findings (Peabody et al., 2017), hinting at potential threats to water quality.For the first time, we report positive associations between Legionella occurrence and anthropogenic disturbances like farms, pasture, and fields.Our predictions revealed Legionella ASVs as positively linked to aluminium and colour value, suggesting opportunistic growth where other bacteria are limited (Fig. 5).Legionella's propensity for biofilm living and intracellular lifestyles may contribute to its resilience in the face of e.g.nutrient limitations (Leon-Sicairos et al., 2015).Our findings align with previous suggestions of association between Legionella and humic substances (Eriksson et al., 2023), exhibiting a significant correlation with iron.The intricate interplay of iron, humic substances, and land use poses potential challenges to microbial communities, especially in humic-rich boreal lakes (Kritzberg et al., 2020), which according to our predictions can be favourable for Legionella occurrences.Our study highlights the need for future research, since the disentanglement of complex interplays require holistic approaches.This includes experiments introducing humic substances to environmental water, to understand the dynamics impacting this bacterial niche and its implications for water quality management.

Conclusion
To summarize: I) 27 % of the most common bacterial taxa in the lake inflows were influenced by buildings, farms, and fields, increasing the bacterial diversity.
II) The bacteria and water chemistry upstream can considerably affect the composition downstream in the lakes, where in some cases more than half of the features originated from the inflows.III) Land use can, through causal links, lead to indirect downstream effects on lake bacterial taxa, e.g. through changes in humic substances influencing common freshwater bacteria.IV) Legionella was particularly abundant in the lake inflows and increased in occurrence with the presence of pastures, fields, farms, aluminium, iron, and humic substances.
To ensure access to reliable clean water and suitable water treatment, it is therefore important to consider upstream land use and inflows even if no direct effects can be observed on water quality in terms of chemical measurements, since land use can influence the microbial communities in inflows where pathogens can naturally occur.There were no positive associations between land-use and faecal indicator bacteria, suggesting that these indicators are insufficient to detect land use effects.Considering climate change in Nordic lakes, with increasing humification, precipitation, flooding and freshwater inflow, bacteria utilizing these conditions, including potential pathogens, might become more common.It is our opinion that these findings are of interest to fundamental microbial ecology as well as for water management to identify robust water sources and to avoid enrichment of Legionella.Additional broader investigations of stressors on the microbial network and the ecology of this diverse genus would facilitate preventative measures, helping secure water accessibility.

Fig. 1 .
Fig. 1.Experimental design.Panel a: Land use can affect the microbial community and water chemistry.Also, the microbial community and water chemistry can affect each other.This study therefore focuses on a holistic approach examining the potential effects of upstream land use on water quality.Panel b: Sampling design and land use categories for one of the five lakes.For each of the four sampling occasions, stream inflows (numbered 1-8) and lake water including shore water (both referred to as "Lake") was sampled.Panel c: Hypotheses and analyses for the study of the effect of upstream land use on the water quality.I: Land use does have an influence on the water properties; II: The upstream inflows do have an effect on the overall water properties and the composition of the microbiota in the lakes; III: The degree of disturbance of water quality due to land use/anthropogenic action does affect the inflows and has a measurable effect on the lakes; IV: Land use can affect the occurrence of genus Legionella, a potentially pathogenic organism that could cause problems in raw water, demonstrating how upstream land use could impact downstream water quality.

Fig. 3 .
Fig. 3. Bacterial community composition.Panel a Iris plot showing the taxonomic composition of the samples on the phylum level for all taxa.Black dot: inflow samples.Panel b Centre-log-transformed PCA ordination of samples and all taxa.No outline: lake samples, Black outline: inflow samples.Panel c Iris plot showing just the top 100 taxa.Panel d Centre-log-transformed PCA ordination showing just the top 100 taxa.No outline: lake samples, Black outline: inflow samples.

Fig. 6 .
Fig. 6.Legionella abundances and species.Panel a Results of a real-time qPCR assay based on primers targeting a Legionella-specific part of the 16S rRNA gene.The resulting bacterial concentration was calculated based on cq values in the linear regression equation of a calibration curve specific to the assay: y(log bacterial concentration) = -0.2911*x(cqvalue) + 10,361.Identical colours denote technical replicates.Panel b Abundances of the 10 most common full length 16S rRNA sequencing taxa identified as belonging to genus Legionella as proportions of total abundance.Panel c Relative proportions of the taxa shown in panel b.For b and c, taxa were classified by the Epi2Me pipeline using a custom database based on GTDB.

Fig. 7 .
Fig. 7. Cladogram showing the diversity of Legionella taxa present in the dataset, with red dots names representing studied ASVs, green dots representing Aquicella/ the outgroup, cyan dots representing Legionella pneumophila reference sequences, and blue sequence names representing other Legionella reference sequences.Arrows indicate the three most numerous taxa subjected to model analysis.