A comparative analysis employing a gene- and genome-centric metagenomic approach reveals changes in composition, function, and activity in waterworks with different treatment processes and source water in Finland

The emergence and development of next-generation sequencing technologies (NGS) has made the analysis of the water microbiome in drinking water distribution systems (DWDSs) more accessible and opened new perspectives in microbial ecology studies. The current study focused on the characterization of the water microbiome employing a gene- and genome-centric metagenomic approach to five waterworks in Finland with different raw water sources, treatment methods, and disinfectant. The microbial communities exhibit a distribution pattern of a few dominant taxa and a large representation of low-abundance bacterial species. Changes in the community structure may correspond to the presence or absence and type of disinfectant residual which indicates that these conditions exert selective pressure on the microbial community. The Archaea domain represented a small fraction (up to 2.5%) and seemed to be effectively controlled by the disinfection of water. Their role particularly in non-disinfected DWDS may be more important than previously considered. In general, non-disinfected DWDSs harbor higher microbial richness and maintaining disinfectant residual is significantly important for ensuring low microbial numbers and diversity. Metagenomic binning recovered 139 (138 bacterial and 1 archaeal) metagenome-assembled genomes (MAGs) that had a >50% completeness and <10% contamination consisting of 20 class representatives in 12 phyla. The presence and occurrence of nitrite-oxidizing bacteria (NOB)-like microorganisms have significant implications for nitrogen biotransformation in drinking water systems. The metabolic and functional complexity of the microbiome is evident in DWDSs ecosystems. A comparative analysis found a set of differentially abundant taxonomic groups and functional traits in the active community. The broader set of transcribed genes may indicate an active and diverse community regardless of the treatment methods applied to water. The results indicate a highly dynamic and diverse microbial community and confirm that every DWDS is unique, and the community reflects the selection pressures exerted at the community structure, but also at the levels of functional properties and metabolic potential.


Introduction
Despite considerable improvements in treatment processes and disinfection practices, outbreaks associated with drinking water have been consistently reported worldwide (WHO, 2021). As a result, effective monitoring of microbial contamination in drinking water distribution systems (DWDSs) is critical to reduce health risks, particularly for immunocompromised members of populations (Prüss-Üstün et al., 2008;Ashbolt, 2015;Yates, 2019). Culture-based methods are primarily used to assess the microbial quality of drinking water, but these assays are selective in nature, providing a limited view of relevant issues (Gomez-Alvarez et al., 2015), such as: (1) intrinsic microbial diversity within DWDSs; (2) metabolic potential that might enhance the survival of pathogens; (3) presence of antibiotic resistance genes and antimicrobial resistance mechanisms; and (4) microbes responsible for the deterioration of distribution system infrastructure and water quality. The implementation of next generation sequencing technology (NGS) to study the water microbiome has provided a better understanding of the taxonomic affiliation, functional potential, and overall microbial diversity in DWDSs (Tan et al., 2015;Zhang and Liu, 2019;Garner et al., 2021;Tiwari et al., 2022). More recently, advanced metagenomic surveys have documented that DWDSs support a complex microbial network (Douterelo et al., 2018;Brumfield et al., 2020;Gomez-Alvarez et al., 2021). Therefore, a more complete characterization of the microbial community structure of DWDSs using molecular tools is critical to address public health research questions (e.g., conditions promoting the emergence of pathogens), which may be more difficult to answer using traditional methods.
Most of the Finnish population (up to 90%) is served by centralized DWDSs with the rest of the population using an alternate source of drinking water such as private drinking water wells (Ikonen et al., 2017). In 2018, there were 153 large EU-regulated waterworks in Finland that met the reporting criteria of the Drinking Water Directive, which supplied domestic water to about 4.5 million users with 43% of domestic water produced from groundwater, 38% from surface water and the remaining 19% from artificially recharged groundwater (Zacheus, 2013). The disinfection methods in Finnish waterworks include non-disinfection and combinations of UV-light, ClO 2 , Cl, NH 2 Cl and NaOCl in other systems. Since water source, treatment processes and disinfectant play a key role in shaping the bacterial community in the distribution system, we hypothesize that the bulk water at each service area will harbor distinct and diverse bacterial communities as well as unique water functional potential. This study reports on the characterization of the bulk phase water microbial community, based on 16S rRNA gene, metagenome and metatranscriptome libraries, from five waterworks in Finland with different raw water sources, treatment methods, and disinfectant. This study provides information regarding microbial community diversity influenced by different raw water sources and different water treatment technologies. It also provides a window into the functional properties and metabolic potential of the water microbiome.

DWDS sites, sample collection and nucleic acid extraction
A detailed description of the sample locations, details, and processing as well as specific details and characteristics of the five drinking water distribution systems (DWDS) can be found in our previous research papers focusing on the water microbiome (Inkinen et al., 2019(Inkinen et al., , 2021 and the physico-chemical characteristics of DWDSs (Ikonen et al., 2017). The flowchart methodology of this research is available in Supplementary file: Fig. S1. Briefly, bulk water samples were collected from five different DWDSs (A to E) with the aim to cover the main treatment processes that are used in drinking water production in Finland (Table  S1). Each waterwork varies regarding raw water source, treatment method, and disinfection treatment as follows: artificial groundwater production without any disinfection (DWDSs A and B, same geographical location), surface water with chlorine dioxide (ClO 2 ) and chlorine (Cl 2 ) disinfection (DWDS C), surface water with chloramine (NH 2 Cl) disinfection (DWDS D), and groundwater treated with sodium hypochlorite (NaOCl) (DWDS E). DWDS C to E included UV disinfection before chlorination.
Large volume bulk water samples (n = 10) were collected in two consecutive weeks during the summer season (August-September) at each location. The measured physico-chemical water quality and nutrients are available in Supplementary file: Table S2. Briefly, 100 L from the cold-water system was collected and filtered on-site using a dead-end ultrafiltration method (DEUF) using a Rexeed-25A hollow-fiber polysulfone filter (Asahi Kasei Medical Co., Ltd., Tokyo, Japan) attached to a tap (Inkinen et al., 2021). The average flow of water during sample collection was 3 L/min. The filters were stored and transported on ice for elution and DNA/RNA processing in the laboratory. Detailed information on sample collection, nucleic acid extraction, and complementary DNA (cDNA) synthesis from the purified RNA can be found in Supplementary file: Materials and Methods.

16S rRNA gene amplification, sequencing, and reads processing
The highly variable V3-4 region of the bacteria 16S rRNA gene was amplified using the primer set 341F and 785R (Klindworth et al., 2013). The primers A340F (Gantner et al., 2011) and 915R (Stahl and Amann, 1991) were used for the amplification of the archaea 16S rRNA gene. Paired-end 300 bp reads were generated using the MiSeq ® platform (Illumina Inc., San Diego, USA) and screened following the procedure described in Gomez-Alvarez et al. (2016). Detailed information on PCR amplification, sequencing, and processing of reads can be found in Supplementary file: Materials and Methods. After quality control filtering and removal of artificial sequences, 138,073 and 54,534 reads were retained from 16S rRNA bacteria and archaea libraries, respectively. Archaea samples from sites C and D were excluded from further analysis due to a small number of recoverable reads.

16S rRNA microbial community assemblages
Prior to community analysis, 16S rRNA gene libraries were rarefied to the smallest data set (4450 bacteria and 670 archaea reads). Bacteria and archaea analysis identified 6776 and 1176 operational taxonomic units (OTUs), respectively. Normalized libraries were used to calculate richness (S), richness estimators (ChaoI and S ACE ), Shannon diversity (H) and evenness (H E ) with the software mothur v1.45.2 (Schloss et al., 2009).
Taxonomic classification was obtained using the reference database Genome Taxonomy Database (GTDB) release 95 (Parks et al., 2020). Phylogenetic trees were constructed from the alignments of 16S rRNA gene sequences based on the maximum likelihood method using the software MEGA X v10.1.7 (Kumar et al., 2018).
To recover the active functional and metabolic information from the RNA extracted from the water samples, metatranscriptome of cDNA from the purified RNA were sequenced and processed following the same procedure as for the metagenome libraries. Then, ribosomal RNA was removed in silico by mapping metatranscriptomic reads to multiple rRNA databases with default settings using the local alignment tool SortMeRNA (Kopylova et al., 2012). The libraries from Sample B2, and DWDS C and E were excluded from subsequent analysis due to poor recovery of filtered reads. Only the data sets from sites A (ND) and D (CHM) were used to determine which populations were metabolically active at the time of sampling. The libraries contained an average (±SD) of 4290,346 ± 316,333 reads per sample.

Assembly, annotation, and OTU diversity of metagenomic reads
Libraries were de novo assembled using MEGAHIT v1.2.9 (Li et al., 2016) with default parameters but discarding contigs below 1500 nucleotides. Contigs were annotated with MetaProkka v1.14.6_1 (Telatin, 2020) a modified version of Prokka (Seemann, 2014). The tool SingleM v0.13.2 (Woodcroft, 2020) was used to estimate the abundances of OTUs directly from metagenome data from each sample. Shannon diversity was calculated based on the average of the rarefied OTU table across each of the 14 single-copy marker genes.

Taxonomic and metabolic inference of metagenomic and metatranscriptomic reads
Taxonomy was assigned using Kraken2 v2.1.2 (Wood et al., 2019) with a confidence value of 0.05 for taxonomic assignment using the pre-built custom Genome Taxonomy Database (GTDB) release 95 (http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release95/kraken2). Taxonomy counts for each sample were summarized by collapsing taxonomic assignments to the phylum, class, order, family, and genus level with Bracken v.2.6.1 (Lu et al., 2017). A Sankey diagram was created using the online tool Pavian to illustrate the flow of reads from the root of the taxonomy to more specific ranks (Breitwieser et al., 2020). Metabolic reconstruction and the relative abundance of genes involved in key biogeochemical pathways were determined by DiTing v0.9 (Xue et al., 2021). Hierarchical classification (BRITE, KO, modules, pathways) of metabolic functions was obtained using the online tool FuncTree2 (Darzi et al., 2019). Selected waterborne pathogens were identified at the genus level from samples using the Pathogen-fluctuations script (Ghosh, 2021). The input file was the output of a UniRef90 (Suzek et al., 2015) based functional gene classification implemented through the HUMAnN v3.0.0 pipeline (Beghini et al., 2021). Detailed information on metabolic profiles and characterization can be found in Supplementary file: Materials and Methods.

Multivariable ordination and statistical analysis
Non-metric multidimensional scaling (nMDS) was used to describe the relationships among microbial communities. The nMDS was based on the Square Root Jensen-Shannon Divergence coefficient (dissimilarity) matrix. The Jensen-Shannon divergence is a method of measuring the similarity between two probability distributions based on relative abundance. A one-way permutational multivariate analysis of variance (PERMANOVA) test was applied on the distance matrix with 9999 permutations to determine if there were significant differences (α = 0.05) between the microbial communities (Anderson, 2001). Similarity Percentage (SIMPER) analysis was conducted to determine the percentage contribution of species to the differences observed in non-disinfected to treated waters (Clarke, 1993). A Mann-Whitney U test (α = 0.05) was used to evaluate the differences in diversity indices (non-disinfected vs. treated waters), whereas the relationship between metabolic processes (e.g., genes) and DWDSs were examined using the non-parametric Kruskal-Wallis test for equal medians (α = 0.05). Ordination plots, PERMANOVA, SIMPER, Mann-Whitney U test, and Kruskal-Wallis analysis was performed with the software PAST v4.06 (Hammer et al., 2001). Statistical comparisons between total and active community profiles were calculated based on the Fisher's exact test with corrected q-values (Storey's FDR multiple test correction approach) using the software package STAMP v2.1.3 (Parks et al., 2010).

Data availability
The sequence data for this study have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEB40814 with the following BioSample numbers: SAMEA7465213 (Accession: ERS5222917) to SAMEA7465227 (ERS5222931). A single zip folder (THL_MAGs_selected_bins.zip) including all the selected MAGs is provided as a supplementary file.

Microbial community assemblages were associated with disinfectant treatment
Large volume bulk water samples were collected in two consecutive weeks during the summer season from each of the five DWDS locations. Raw water source and disinfection were the main differences between the systems (Ikonen et al., 2017). Metagenomic-based microbial diversity (Chao I index) was significantly higher in DWDSs (Mann-Whitney U test: z = 2.5, p = 0.0142) with no disinfectant (median: 2766, range from 2434 to 3334, n = 4) and decreased in chlorine and chloramine disinfected DWDSs (median: 641, range from 245 to 1601, n = 6) (Fig. 1A). The absence and decrease in disinfectant residual levels in conjunction with increasing distance from the waterworks enables regrowth and explains the increase in microbial growth at these distribution networks (Ikonen et al., 2017). Inkinen et al. (2019) observed a lower number of eukaryotic species in disinfected DWDSs as compared to nondisinfected water. Furthermore, Dai et al. (2020) established that the relative abundance of Archaea is dependent on the concentration of disinfectant residual. Maintaining disinfectant residual is significantly important for the mitigation of microbial contamination in DWDSs (USEPA, 2007). Although a few European countries require all water supplies to be disinfected and a disinfectant residual to be maintained, some countries do not require disinfection or the use of a disinfectant residual (Hydes, 1999). Finland offers guidance on disinfectant residuals (USEPA, 2007).
The taxonomic composition of the metagenomes revealed that most of the Bacteria domain diversity was associated with the phylum Proteobacteria followed by, Nitrospirota, Omnitrophota, Patescibacteria, Planctomycetota, Bacteroidota, Desulfobacterota, and Actinobacteriota. The Archaea domain represent a small fraction (up to 2.5%) of the prokaryote community (Fig. 1B). Results from previous shotgun metagenomic DNA sequencing studies have indicated lower proportions of archaeal representatives in drinking water systems (Gomez-Alvarez et al., 2012a andDouterelo et al., 2018). Although the abundance of archaeal representatives is relatively low, our study confirmed the presence of a highly taxonomic diversity composed of at least twelve major classes (Fig. 1B). Their role, particularly in non-disinfected DWDS may be more important than previously considered (Inkinen et al., 2021). The negative relationship between archaea abundance and disinfectant residual (i.e., treatment) is particularly evident for disinfected DWDSs, where the archaea community seemed to be effectively controlled by disinfection of the water. Overall, the observed taxonomic composition in which the dominant phyla represented here (albeit not at the same ratios) and the small relative abundance of Archaea is consistent with a previous meta-analysis of DWDSs (Bautista-de los Santos et al., 2016).
Concurrently we used 16S rRNA-based analysis of the Bacteria and Archaea domain to confirm that DWDSs with no disinfectant harbor higher microbial diversity (Figs. S7A and S8A). The resulting analysis of OTUs corroborated that the DWDS communities displayed variations in their taxonomic composition (Figs. S7B and S8B) and that bacterial communities formed clear clusters (nMDS: stress = 0.07; PERMANOVA: F = 5.8, p = 0.0007) based on disinfectant treatment (Fig. S7C). In general, complex microbial communities are extremely diverse and typically exhibit a distribution pattern of a few dominant taxa and a large representation of low-abundance bacterial species (Nemergut et al., 2013). Furthermore, the observed changes in the structure of the microbial community may correspond to the presence/absence and type of disinfectant residual which indicates that these conditions exert selective pressure on the microbial community.

Metabolic versatility of drinking water microbial communities
We analyzed the metabolic potential of the DWDS metagenomes by focusing on KEGGbased pathways and modules (functional units of genes linked to specific metabolic capacities). The analysis of the metabolic potential revealed a total of 2990 KEGG orthologs (KOs) (ND = 2515; CHL = 2542; CHM = 2334) which were further categorized into 45 BRITE (level 2) functional categories (ND = 44; CHL = 44; CHM =45) comprising 344 pathways (ND = 307; CHL = 311; CHM = 326), and 778 modules (ND = 688; CHL = 701; CHM = 708). Drinking water systems host a great microbial biodiversity and provide niche stability to a vast collection of microorganisms, and our study detected and identified numerous functional processes comparable in numbers to complex ecosystems (Gomez-Alvarez et al., 2012a). These drinking water environments exhibit conditions that are favorable for the establishment of distinct communities harboring numerous biochemical processes (Fig. 2). Furthermore, the metagenomic analysis highlighted a moderate to high coverage (i.e., complete) of constructed KEGG pathways associated with core functions such as metabolism (energy, nucleotide, amino acid, xenobiotics), genetic processes (transcription, translation, replication, and repair), environmental processes (membrane transport, signal transduction), and cellular processes (growth, membrane functions) (Fig.  S9). Among the energy metabolic processes (Fig. 2B), carbon fixation (chemolithotrophs and photosynthetic), oxidative phosphorylation, nitrogen, sulfur, and methane metabolisms showed a greater estimated number of completed pathways (avg coverage [±SD] = 79.4 ± 16.1) suggesting the potential of the water microbiome to actively utilize these energetic processes (Fig. S9B).
Functional analysis indicated differences in the metabolic characteristics of each DWDS at the KEGG pathways, modules, and KO levels. Cluster analysis grouped metabolic profiles (i.e., microbial communities) into three distinct clusters (PERMANOVA: F = 11.9, p = 0.0007) (Fig. S10). Metabolic processes as well as sequences associated with diseases, environmental, and cellular information processing were enriched in disinfected drinking water systems (CHL and CHM) compared to non-disinfected samples ( Fig. 2A). In particular, the energic metabolic pathways constituting cellular respiration (oxidative phosphorylation) and carbon fixation (photosynthetic and non-photosynthetic [e.g., chemolithotrophs]), as well as metabolic pathways for nitrogen, sulfur, and methane were overrepresented in disinfected systems (Fig. 2B). The ecosystem characteristics of each DWDS may have led to differences in the ability to use carbon and nitrogen sources by their respective microbiome. These results, however, suggested that a higher taxonomic diversity did not accompany higher metabolic gene diversity (Figs. 1A and 2). For example, metabolic diversity was not significantly different between no disinfectant and disinfected DWDSs (Mann-Whitney U test: z = 1.2, p = 0.24095). The inconsistency between the metabolic and taxonomic diversities might suggest a higher functional redundancy of the microbial community in non-disinfected (ND) systems. Distinct species in similar niches might perform similar metabolic roles in biogeochemical cycles and overlapping niches may increase the functional redundancy of the ecosystem (Wang et al., 2017). In general, this study provided insight into the effects of disinfectant (as selective pressure) and water source on the metabolic potential of the DWDS microbial community. Similarly, Dai et al. (2020) suggests that selection pressures exerted within disinfected systems are not only evident at the community structure, but also evident at the community metabolic potential level.

Prevalence of antimicrobial resistance and occurrence of pathogenicity traits in DWDSs
Despite the drinking water in Finland being regularly monitored and showing excellent water quality, several antimicrobial resistance (AMR) and pathogenicity traits were identified in the DWDS metagenomes ( Fig. 2A). The average proportion of genes associated with arsenic resistance (ars operons) was higher in the microbial communities in the CHL and CHM systems compared to ND water systems. A parallel study of these DWDSs revealed that most of the ARGs were associated with resistance to several antibiotic classes such as bacitracin, mupirocin, tetracycline, polymyxin, beta-lactam, aminoglycoside, glycopeptide, fosmidomycin, and fluoroquinolone (Tiwari et al., 2022). The DWDSs studied herein are considered pathogen-free but still might contain opportunistic waterborne pathogens. Preliminary results using the Pathogen-fluctuations script (Ghosh, 2021) identified functional genes of selected waterborne pathogens that are representatives of the genera Acinetobacter, Aeromonas, Burkholderia, Campylobacter, Citrobacter, Enterobacter, Enterococcus, Escherichia, Haemophilus, Helicobacter, Klebsiella, Legionella, Mycobacterium, Proteus, Providencia, Pseudomonas, Salmonella, Serratia, Shigella, Staphylococcus, Stenotrophomonas, Streptococcus, and Vibrio. These approaches only identified the presence of genes associated with resistance mechanisms and pathogenicity traits, and not whether these are actively transcribed genes, or whether their associated hosts are active members of the water microbiome at these sites.
The presence of these or other genes associated with AMR mechanisms and opportunistic waterborne pathogens does not imply evidence of water contamination or represent a risk to the public health. Nonetheless, these results may illustrate the ubiquity of these genes and public health-relevant or closely relate microorganisms in manufactured water systems (Sanganyado et al., 2019;Gao et al., 2021). The establishment and growth of waterborne pathogens (including resistance mechanisms to disinfectants and antibiotics) in drinking water systems may have significant impacts on human health as well as serious economic consequences.

Microbiome-level differences in biogeochemical processes between non-disinfected and disinfected systems
The complexity of biogeochemical processes (at the level of metabolic gene organization) of the water microbiome is evident in DWDSs (Figs. 3 and S11). The normalized relative abundance of genes involved in nitrogen and sulfur metabolism was higher in disinfected systems than in non-disinfected systems and is consistent with previous comparisons of drinking water samples from disinfected and non-disinfected systems (Bautista-de los Santos et al., 2016;Dai et al., 2020). This indicated that compared with the microbes in the non-disinfected systems, those in the disinfected waters tended to rely on nitrogen (denitrification, DNRA, ANRA, nitrification, comammox) and sulfur (assimilatory sulfate reduction, dissimilatory sulfate reduction to sulfite, thiosulfate oxidation, and sulfide oxidation) compounds more heavily as an energy source (Fig. 3). In DWDSs, the nitrogen and in some extent the sulfur pathway play a significant role in the ecosystem, and the populations engaged in these pathways are part of a complex and highly diverse microbial community. The transformation of nitrogen into its many redox states is key to ecosystem productivity and is driven by microbially mediated reactions. Evidence of the importance of the nitrogen biogeochemical cycle is derived from several studies of DWDSs Gomez-Alvarez et al., 2012a. Sulfur is another essential element in drinking water ecosystems that is cycled by microbes between oxidized and reduced forms (Fig. 3B). The wide range of annotated functions associated with several sulfur pathways may be indicative of the availability of several electron donors at drinking water distribution pipes undergoing corrosion (Gomez-Alvarez et al., 2012b). Concrete corrosion of distribution systems is a significant cause of deterioration and premature failure.
Analysis of metagenome libraries identified key genes associated with the sulfur pathway (Fig. 3B). These functions were found to be abundant in the metagenomes, although we observed differences in the enrichment of specific gene families within the sulfur cycling. For example, the relative abundance of genes related to the processes of assimilatory sulfate reduction, dissimilatory sulfate reduction to sulfite, thiosulfate oxidation, and sulfide oxidation were significantly higher in the CHL systems (Kruskal-Wallis: H = 7.9, p = 0.0193) with sporadic difference observed in other processes (Fig. 3B). In contrast, the genes encoding dsrAB and phsABC, which are associated with dissimilatory sulfite reduction to sulfide and thiosulfate disproportionation pathways, respectively, were present in a higher proportion in the microbial communities from the ND systems than from the disinfected systems. CHM systems showed a higher representation of genes involved in sulfite oxidation, as well the genes hydADGB (sulfhydrogenases) and ETHE1-like sulfur dioxygenase, which are associated with sulfur-reducing and sulfur-oxidation activities, respectively (Fig. 3B).
Lastly, carbon is one of the most essential elements to living organisms and the carbon cycle illustrates the connection and exchange between heterotrophs and autotrophs in DWDS ecosystems. The presence and differences in the proportion of genes involved in carbon metabolism in the results suggest different carbon use strategies (Fig. S11). Overall, changes in water quality and the characteristics of the environment drive the variation of microbial communities which regulate core biogeochemical processes such as carbon, sulfur, and nitrogen metabolism in DWDSs. Finally, we need to understand how these biogeochemical cycles interact with one another in DWDS environments (Rousk et al., 2014).
MAGs corresponding to the classes Paceibacteria, Alphaproteobacteria, Gammaproteobacteria, and Nitrospiria were differentially enriched with disinfectant treatment, consistent with their dominance at each DWDS (Fig. 4A). Among these, the MAG with the highest percentage of mapped reads belong to the class Paceibacteria (phylum Patescibacteria) and made up of approximately 26% of the ND microbial community. The analysis of 16S amplicon sequencing data supported this statement as Paceibacteria, belonging to different families, was among the most abundant taxa found in ND systems. This highly abundant group of common inhabitants of freshwater (Proctor et al., 2018) and groundwater environments (Brown et al., 2015) consists of small size and low-nucleic acid content bacteria of which most lack numerous biosynthetic pathways (Fig. 4B). Tian et al. (2020) proposed that the reduced functional and metabolic features combined with genomic simplicity are adaptations of Patescibacteria to the extreme conditions (e.g., low or lack of nutrients and oxygen) in groundwater environments.
On the other hand, MAGs corresponding to the classes Gammaproteobacteria (Polaromonas spp.) and Alphaproteobacteria (Sphingorhabdus_B spp., Rhizobiales spp. UBA4765, and Hyphomicrobium spp.) were differentially enriched in CHL systems in comparison to ND and CHM samples. These ecologically diverse microorganisms are common inhabitants in freshwater systems but are also widespread in drinking water systems (Bautista-de los Santos et al., 2016). The dominant MAG in CHL systems was the Polaromonas spp.
comprising an average of 42% of the microbial community. Polaromonas is generally not observed in natural freshwater systems, but the nutritional versatility microorganism is prevalent in oligotrophic environments such as bottled mineral water (Carraturo et al., 2021). In a previous study, Polaromonas was identified as the predominant genus in granular activated carbon (GAC) filters from full-scale water treatment plants in the Netherlands (Magic-Knezev et al., 2009). The contribution of this taxon to the microbial community in DWDSs requires further examination. The MAGs Sphingorhabdus_B, Rhizobiales spp. UBA4765, Hyphomicrobium spp. represent an average of 16%, 9%, and 7% of the population in CHL systems, respectively. Members of these Alphaproteobacteria genera represent typical freshwater bacteria and are highly physiologically diverse (Jogler et al., 2013;Parks et al., 2017). Despite chemical disinfection, genomic annotation revealed that the group of MAGs harbors numerous biosynthetic pathways related to carbon, nitrogen, and sulfur compounds (Fig. 4B). It is evident from our analysis that the CHL community contained taxa that are metabolically diverse with lower functional redundancy compared to the ND and CHM communities, where distinct species might perform distinct metabolic roles (Fig. 2B).
The Nitrotoga-like MAG exhibited the highest relative abundance in the CHM system comprising 23% of the metagenomic data set followed by a Nitrospira-like MAG (phylum Nitrospirota) comprising 7%. Both MAGs were identified as nitrite-oxidizing bacteria (NOB) which play a critical role in the biogeochemical nitrogen cycle by metabolizing nitrite to nitrate (Fig. 4B). NOB are physiologically versatile, widely distributed, and may have diverse ecological functions within and beyond the nitrogen cycle, including carbon, hydrogen, and sulfur biogeochemical metabolism (Daims et al., 2016). The known phylogenetic diversity of NOB (classes Alphaproteobacteria and Gammaproteobacteria and the phyla Nitrospirota and Nitrospinota) has been now expanded by the description of several new NOB lineages including the genus Nitrotoga (Alawi et al., 2007). Boddicker and Mosier (2018) estimated the relative abundance of Nitrotoga-like sequences to be as high as 10% of the total microbial community across globally distributed freshwater habitats. Nitrotoga may play a critical role in the biogeochemical nitrogen cycle (i. e., nitrite oxidation) in engineered environments (Alawi et al., 2009;Lücker et al., 2015). Moreover, the genomic characterization of Nitrotoga has revealed potential alternative energy metabolisms and a broader spectrum of physiological adaptations which may explain their competitive adaptation in engineered environments (Kitzinger et al., 2018). Further research is needed to understand which environmental conditions allow the coexistence of Nitrotoga with Nitrospira (or other NOB) and which factors lead to their dominance in CHM DWDSs. Overall, the presence and occurrence of NOB-like microorganisms have significant implications for nitrogen biotransformation in drinking water systems (Gomez-Alvarez et al., 2015, particularly in storage tanks (Gomez-Alvarez et al., 2021).

Active populations in non-disinfected and disinfected systems
We compared the relative abundance of metagenomic (DNA) and metatranscriptomic (cDNA) reads to determine the extent to which the total taxonomic and functional potential abundance of the microbial community correlated with its active population. Herein, for the purpose of this discussion, the relative abundance obtained from the metagenome and metatranscriptome libraries will be referred to as the total and active fraction of the community, respectively. Only the data sets of sites A (ND) and D (CHM) sets were used to determine which populations were metabolically active at the time of sampling. While the sequencing depth in this study compensated in part for the limited number of samples that were analyzed (n = 8), additional 16S rRNA and metagenomic surveys are needed to better understand the total microbial genetic potential of these systems. However, through randomization procedures (i.e., Fisher's exact test with Storey's FDR multiple test correction approach) we found statistically distinct taxonomic and functional groups in each of the DWDS samples. Similar taxonomic groups (i.e., membership) were identified in the total and active community at the phylum and class level for the dominant taxa but of these dominant taxa showed different structure patterns (i.e., distribution) in their active community (Fisher's exact test, q < 0.0001). For example, in ND systems the class Paceibacteria dominated the total community (36%), but a shift was observed to the classes Alphaproteobacteria (from 15% to 21%), Gammaproteobacteria (13% to 28%), and Nitrospiria (1% to 9%) in the active community (Fig. S14A). Paceibacteria was reduced to only 5% of the active ND community. It is important to emphasize that members of the class Paceibacteria consists of small size cells with low-nucleic acid content that lack numerous biosynthetic pathways (Tian et al., 2020). To avoid predation and endure at these environments these microorganisms with highly reduced genomes may adjust their metabolism and rely on simple intermediate metabolites from a host-associated lifestyle for energy. Similarly, a moderate shift in the structure of the CHM community was observed with an increase of the classes Gammaproteobacteria (from 45% to 56%) and Nitrospiria (14% to 25%) while the rest of the classes decreased in the active community (Fisher's exact test, q < 0.0001) (Fig. S16A).
To determine whether there were any patterns in microbial activity distinguishing ND and CHM systems at various taxonomic levels, we generated a flow diagram based on their respective patterns of relative abundance. The analysis of ND systems indicated that the top ten group-level taxa (at various taxonomic levels) decreased in the active community (Fig.  S14B). A similar pattern occurred for the MAGs recovered from ND systems, where 91% (31 out of 34) experienced an average of 0.4-fold change decrease (Fig. 5A). Paceibacteria MAGs, which were the most abundant in the total community were among those with lower significance abundance in the active fraction of the community (Fisher's exact test, q = 0.0236), while Gammaproteobacteria-assigned MAGs expressed higher relative abundance in the active community (Fisher's exact test, q = 0.0007). For example, the MAG members Bin.07 A (Polynucleobacter spp.) and Bin.02 A (Nitrospira spp.) increased from a relative abundance of 3.4% to 18.5% and 1.5% to 3.2%, respectively. Polynucleobacter is a bacterioplankton and member of the family Burkholderiaceae widely detected in freshwater environments (Watanabe et al., 2009). It is unclear why these taxonomic group showed higher abundances in ND systems (Waak et al., 2019). It can be speculated that recharge water (i.e., artificial groundwater) brings along nutrients that specifically favors these microorganisms (Abiriga et al., 2021).
Contrary to the ND systems, the analysis of the CHM system indicated an increase in the active community for most of the dominant group-level taxa at various taxonomic levels (Fig. S16B). This was confirmed by the increase of the top dominant MAGs, where 33% (9 out of 27) experienced an average of 2.7-fold change increase in the active community (Fig. 5B). Compared to the CHM system, only 3 MAGs (9% of the total set) from ND systems showed an increased in relative abundance in the active community with an average of 3.9-fold increase (up to 290% change), while only 67% (18 out of 27) in the CHM system experienced an average of 0.3-fold change decrease (Fig. 5B). These MAGs conserved similar levels of distribution in the total and active communities (Fisher's exact test, q = 1.0). The MAGs Nitrotoga (Bin.03 D), Burkholderiaceae PHCI01 (Bin.22 D), Sphingomonas (Bin.06 D), and Nitrospira (Bin.11 D) maintained their dominance in the active CHM community. Nitrotoga and Nitrospira are the main NOB populations in aquatic environments (Spieck et al., 2021). In chloraminated-treated drinking water distribution systems, Nitrotoga coexists together with the Nitrospira populations (Waak et al., 2019) which also co-occurs with heterotrophic bacteria classified as Rhizobiales and Sphingomonas (Potgieter et al., 2020). The current study suggested the relevance of co-occurrence relationships which may be important for understanding community assembly and ecosystem functions in DWDS.
In addition to community-level taxonomic analysis, we also examined and compared the relative abundance of metabolic processes in the active fraction of the community. In ND systems metabolic genes were universally present but only a significant increase in genes related to nitrification and methane oxidation were detected at higher relative abundances in the active ND community (Fisher's exact test, q < 0.0001) (Fig. S13). For example, an increase of ammonia monooxygenase (amoCAB) and hydroxylamine dehydrogenase (hao) genes in the nitrification pathway was detected (Fig. S14C). This is consistent with changes in the taxonomic composition in the active ND population where the NOB group Nitrospiria increased in relative abundance from 1.2% to 9.1% (Fisher's exact test, q < 0.0001) (Figs. S14A and B). Contrary to the ND systems, the analysis of the CHM system indicated a noticeably difference in operational metabolic pathways between the total and active community (Fig. S15). In CHM systems genes related to nitrification and DNRA were detected at higher relative abundances in the active CHM community (Fisher's exact test, q < 0.0001) (Fig. S16C). This corresponds to an increase of the genera Nitrosomonas (Fig.   S16B), a chemoautotrophic bacterium responsible for the biological oxidation of ammonia/ ammonium to nitrite. To determine whether there were any patterns in microbial activity distinguishing ND and CHM systems at the population level, we compared the coverage of the metabolic and biogeochemical functional traits of the recovered MAGs with their respective metagenomic and metatranscriptomic data sets. MAG coverages in ND and CHM systems accounted for only 19% and 43% of the metabolic pathways in total and active communities (Fig. S17). In general, recovered MAGs from the CHM system may represent a considerable proportion of the active bacterial population, where MAGs from the ND systems represent only a reduced portion of the active community. Given these results, the broader set of genes transcribed in both drinking water ecosystems (i.e., DWDS) may indicate an active and diverse community regardless of the treatment methods applied to the water (Table S1).

•
The microbial communities in the DWDS sites exhibit a distribution pattern of a few dominant taxa and a large representation of low-abundance bacterial species.

•
Metagenomes and 16S rRNA-based analysis of the Bacteria and Archaea domain confirmed that DWDSs with non-disinfected water harbor higher microbial richness and composition. This suggest that maintaining disinfectant residual is significantly important for ensuring low microbial numbers and diversity.

•
The observed changes in the structure of the microbial community correspond to the presence or absence and type of disinfectant residual suggesting that these conditions exert selective pressure on the microbial community.

•
The archaea domain in DWDSs represents a small fraction of the prokaryote community and seemed to be effectively controlled by disinfection of water. One archaea MAG was recovered and was identified as Nanoarchaeota, a phylum composed of small obligate symbionts that lack most genes involved in major biosynthetic pathways. Their role particularly in non-disinfected DWDS may be more important than previously considered.
• Selection pressures exerted within disinfected systems are not only evident at the community structure level, but also evident at the functional and metabolic potential level.

•
The presence of a diverse group of opportunistic pathogens combined with the occurrence of microbial resistance mechanisms may constitute a significant challenge for drinking water treatment efficiency and affect drinking water safety.
• Comparative analysis of the community suggested the relevance of cooccurrence relationships in the biological stability of drinking water systems. The presence and occurrence of NOB-like microorganisms have significant implications for nitrogen biotransformation in drinking water systems.

•
The broader set of genes annotated and transcribed in non-disinfected and disinfected systems may indicate an active and diverse community regardless of the treatment methods applied to the water.    Relative abundances of the pathways involved in the (A) nitrogen and (B) sulfur cycle. The pie chart indicates the relative abundance of each pathway in each disinfectant group and the size of the pie chart is proportional to the relative abundance of the gene involved in the pathway. Disinfectant: ND, none; CHL, chlorine; CHM, chloramine. Nitrogen pathways: ANRA, assimilatory nitrate reduction to ammonium; DNRA, Dissimilatory nitrate reduction to ammonium; Anammox, anaerobic ammonium oxidation. Sulfur pathways: ASR, assimilatory sulfate reduction; DSR, dissimilatory sulfate reduction.  Relative abundance of MAGs recovered from ND and CHM contributing to the total and active fraction of the community. MAGs are ordered from top to bottom by the most abundant MAGs in the total community. Bubble size indicates the relative abundance of the MAG in the total community in orange (•) and the active population in red (•). The relative abundance (%) in the total community was estimated as a proportion of a bin relative to the number of reads mapped to assembled contigs from metagenome and adjusted for the size of the bin, while the active fraction of the community was calculated as a proportion of metatranscriptome reads mapping to a MAG. Taxonomic classification and coverage for each MAG is listed in Table S4.