Anaerobes and methanogens dominate the microbial communities in water harvesting ponds used by Kenyan rural smallholder farmers

• Amplicon sequencing determined the mi- crobial community in water harvesting ponds. Anaerobes (e.g. Spirochaeta and Opitutus dominated the bacterial community. Woesearchaeota and methanogens dominated the archaeal community. A potentially pathogenic Mycobacterium spp. was also detected. Highlights need to reduce anaerobic conditions and screen for potential patho- gens. or anoxic, and if used for irrigation, may potentially impact crop yield and viability. In addition, the opportunistic pathogen non-tuberculous mycobacteria (NTM), Mycobacterium fortuitum was found, comprising > 1% of the bacterial community, suggesting a potential human health risk. Here we suggest low-cost changes to pond management, to improve or ameliorate pond anoxia and remove pathogens to bene ﬁ t the livelihoods and welfare of these farms. This study also shows the applicability of HTS to broadly screen the microbial communities, assess water quality, and identify potentially pathogenic groups.


Introduction
The majority of rural smallholders in sub-Saharan Africa, currently have no access to surface water or groundwater (Rockstrom, 2000). In addition, climate change is adversely affecting the availability of water in countries, such as Kenya, through increased temperatures, erratic rainfall, and drought (Boelee et al., 2013). Therefore, ex-situ rainwater-harvesting systems, especially the use of on-farm water storage structures, represents the sole alternative for accessing sustainable quantities of water for domestic and agricultural purposes (Pachpute et al., 2009). These systems can be divided into two categories, a roof harvesting system (RHS) or pond harvesting system (PHS) (Zabidi et al., 2020). A typical RHS consists of a sloped catchment system (a roof) which drains via a gutter and attached pipe into a storage container (cistern or tank) (Hamilton et al., 2019). A PHS consists of small runoff storage structures ranging from small manually dug ponds to large community earth and sand dams, that directly collect rainwater, used mainly for irrigation of kitchen gardens, and sometimes for domestic and livestock water supply (Biazin et al., 2012). Although RHSs are more widely utilised around the world (Cook et al., 2013;Despins et al., 2009;Farreny et al., 2011;Hafizi Md Lani et al., 2018), PHSs offer several advantages including: (i) the capacity to store larger volumes of water (Wu et al., 2018); (ii) simplicity in terms of construction and maintenance, meaning an overall lower capital cost to the farmer, leading to greater benefit-cost ratio (Berhane, 2018). Despite these advantages, these ponds are openly exposed, with potential risk to human health where they may act as reservoirs for both human, animal and plant pathogens (Brodie et al., 2007;dos Santos and de Farias, 2017). If the water is to be used for potable purposes and/or crop irrigation (especially if the produce is consumed fresh) (Jongman and Korsten, 2016), then water quality needs to be assessed and ensure it meets the minimum drinking water guidelines (World Health Organisation, 2017) prior to use.
Historically, the microbial quality of water has been evaluated on culture-based enumeration of pathogens and the monitoring of faecal indicator bacteria (FIB), such as E. coli and total coliforms (Ahmed et al., 2010(Ahmed et al., , 2011Hamilton et al., 2019). Culture-dependent methods for pathogen detection are impractical as (i) each pathogenic group requires its own unique isolation technique (Acharya et al., 2019); (ii) <1% of microorganisms are culturable (Hugenholtz et al., 1998); (iii) you can only detect the presence of a few microbial groups at a time with limited taxonomic resolution (Chan et al., 2019). The reliability of using FIB to assess water quality has also been questioned due to poor correlations with the occurrence of pathogens, especially non-faecal pathogens (Ahmed et al., 2010;Wu et al., 2011). This may be due to ecological or environmental interactions that compromise their predictive power as proxies for pathogens (Tan et al., 2015).
Molecular techniques, such as qPCR, have been used as an alternative to these culture-dependent methods, for quantification of microbial load and direct detection of pathogens in harvested rainwater (Ahmed et al., 2014). High Throughput Sequencing (HTS) provides one approach for elucidating the fine-scale taxonomic composition of microbial communities, including the identification of potential pathogenic groups (Tan et al., 2015;Vierheilig et al., 2015). Indeed, HTS has been successfully used to detect potential pathogenic groups in RHS systems (Ahmed et al., 2017;Chidamba and Korsten, 2015;Strauss et al., 2019;Zhang et al., 2020); with some of the most prevalent potential pathogenic genera detected including: Acinetobacter, Burkholderia, Chromobacterium, Clostridium, Legionella, Mycobacterium, Nocardia, Serratia, and Yersinia. Although Archaea and Fungi play an important role in human/plant pathogenicity (Bell et al., 2006;Hageskal et al., 2009), and in the biogeochemical cycling of nutrients (Offre et al., 2013), these microorganisms are often overlooked in HTS studies. Consequently, there is limited information regarding the microbial ecology, across all microbial domains in PHS.
The main aim of this study was to apply HTS to characterize the microbial communities present in 16 ponds in Kenya, that are used for kitchen garden crop production. Specifically, this study aims to better understand the relative abundance of microorganisms across taxonomic domains, in relation to the physico-chemical drivers of the communities. This study therefore provides a thorough assessment of harvested rainwater quality including both the presence of potential pathogens and indicator species, and water chemistry (anion and cation concentrations), which may have potential implications for long-term human health and sustainability.

Sample collection
We conducted field surveys at 40 smallholder farms in the County of Laikipia, in Kenya to determine if they had working water harvesting ponds. Of these 40 farms, working ponds were found at 16 sites (mapped in Fig. 1). GPS coordinates are available for each pond but are not listed here for farm anonymity. Triplicate surface water (50 ml) was collected from each pond on the same day. All water samples were filtered through sterile 0.2 μm pore-size syringe filters (Minisart NML, Sartorius). Replicates from each site were filtered using different filters. The volume of water filtered ranged from 6.54 to 15.3 ml. The filtrates were stored at −20°C prior to chemical analysis and filters stored in their plastic casing at −80°C prior to DNA extraction.

Chemical analysis
The filtrate was diluted in MilliQ® water and filtered through a Whatman® GF/F filter by vacuum. Anion and cation concentration was analysed using a Dionex ICS-3000 (Thermo Scientific) against a set of standards ranging from 0 to 500 μM (cations) or 0 to 200 μM (anions). For cation analysis, the eluent was 20 nM methyl sulphonic acid run at a flow rate of 1 ml min −1 for 30 min on a Dionex Ionpac 4 mm column (Column temp 30°C). The eluted cations were detected with a CSRS 300, 4 mm Suppressor onto a conductivity cell detector. For the anion analysis, the eluent was MilliQ® water and 100 mM KOH. Potassium hydroxide was run on a gradient with a flow of 0.25 ml min −1 . This was run through an Ionpac AS 18, 2 mm column onto a conductivity cell detector. The water chemistry was assessed by comparison to local and international guidelines on irrigation and drinking water standards (Ayers and Westcot, 1994;National Environment Management Agency, 2006;Kenya Bureau of Standards, 2015;World Health Organisation, 2017).

DNA extraction
Filters were removed from their plastic casing using a sterile scalpel and sterile tweezers, cut into small pieces using sterile scissors and added to PowerBead Tubes (Qiagen) containing 750 μl of PowerBead Solution and 0.7 mm garnet beads. Filters were subsequently homogenised using a Precellys Evolution Homogenizer (Bertin Technologies) fitted with a 2 ml tube holder and white blocking plate (two 30 s cycles at 10,000 rpm). DNA was extracted from homogenised filters using with a DNeasy PowerSoil Kit (Qiagen) according to the manufacturer's instructions and stored at −80°C until use.

qPCR analysis of phylogenetic marker genes
qPCR of archaeal and bacterial 16S rRNA genes and the fungal internal transcribed spacer (ITS) region was performed using the following primer pairs: Archaea, 344f-ACGGGGYGCAGCAGGCGCGA and 915r-GTGCTC CCCCGCCAATTCCT (Raskin et al., 1994;Stahl and Amann, 1991); Bacteria, 341f-CCTACGGGNGGCWGCAG and 785r-GACTACHVGGGTATCTA ATCC (Klindworth et al., 2013); Fungi, P5-giTS7-GTGARTCATCGAATCT TTG and P7-ITS4-TCCTCCGCTTATTGATATGC (Ihrmark et al., 2012). These primers have previously been used to quantify bacterial and archaeal 16S rRNA genes Huby et al., 2021) and fungal ITS region (Zhang et al., 2018) in environmental samples. qPCR standards of known quantity were created using DNA previously extracted from sediment samples (obtained from the University of Essex). Standards were generated by PCR amplification of the target genes in 50 μl reactions using 25 μl B.H. Gregson et al. Science of the Total Environment 819 (2022) 153040 REDTaq® ReadyMix™ (Sigma-Aldrich), 2 μl of each (10 μM) primer, 20 μl of PCR water (Bioline Reagents Ltd) and 1 μl of DNA template. PCR thermal cycling conditions are summarised in the Table 1: DNA standards were purified using a GenElute PCR Clean-Up Kit (Sigma-Aldrich), prior to quantification using PicoGreen® dsDNA quantification assays (Thermo Fisher Scientific) on a Nanodrop ND-1000 spectrophotometer (Thermo Fisher Scientific). The target gene abundance for DNA standards were calculated assuming a molecular mass of 660 Da for double-stranded DNA using the following formula: Target abundance = 6.023 × 10 23 (copies mol −1 ) × standard concentration (g μl −1 ) / molecular weight (g/mol −1 ) (McKew and Smith, 2015;Tatti et al., 2016). Standard curves for each gene were created using ten-fold dilution series ranging from 10 2 to 10 7 gene copies per μl. For each of the genes, the DNA standards, triplicate samples and three no template controls were amplified in triplicate technical replicates on a CFX 96 Real Time System (Bio-Rad) using SensiFast SYBR No-ROX Kit (Bioline) in 20 μl reactions (10 μl of 2 × mastermix, 0.4 of forward and reverse primers (10 μM), 8.2 μl PCR grade water (Bioline) and 1 μl of template DNA) using a 2-step cycle programme (initial denaturation/polymerase activation for 3 min at 95°C, followed by 40 cycles of denaturation at 95°C for 15 s and combined annealing and extension at 60°C for 30 s). A dissociation curve was run at the end of each assay to verify that only the expected amplification product was generated in addition to confirming by agarose gel electrophoresis. Gene abundances were quantified against their respective standard curves with CFX Manager software (Bio-Rad) using automatic analysis settings for the Cq values and baseline settings. The limit of detection (LOD) for all genes was set at 3.3 cycles lower than the Cq value of the no-template controls (McKew and Smith, 2015;Tatti et al., 2016). Analysis of the standard curves from all qPCR assays showed high efficiency (efficiency >91.5%, R 2 > 0.99) (Table S1).

Amplicon library preparation and sequencing
Amplicon libraries were prepared and sequenced using the methods previously described (Thomas et al., 2020(Thomas et al., , 2021. Amplicons were generated by a 25-cycle (Bacteria) and 31-cycle (Archaea and Fungi) PCR. The additional cycles were performed to account for lower abundances of Archaea and Fungi. The 16S rRNA genes and ITS region were amplified using the same primers previously described for qPCR, modified to contain Illumina-specific overhang adapters. Amplification occurred in 25 μl reactions using 12.5 μl REDTaq® ReadyMix™ (Sigma-Aldrich), 10.5 μl of PCR grade water (Bioline), 0.5 μl of each (10 μM) primer and 1 μl of DNA template. Thermal cycling consisted of an initial DNA denaturation step of 3 min at 95°C followed by 25 or 31 cycles each of 30 s at 95°C, 30 s at 55°C and 30 s at 72°C with a final extension step of 5 min at 72°C, on an Applied Biosystems Veriti 96-well thermal cycler. The resulting PCR products were purified using Agencourt AMPure XP PCR Purification beads (Beckman Coulter), following the manufacturer's instructions. 5 μl of purified PCR product was used in a short secondary PCR, to attach Nextera XT indices, in the presence of 5 μl of Nextera i5 and i7 index, 25 μl REDTaq® ReadyMix™ (Sigma-Aldrich) and 10 μl of PCR water (Bioline Reagents Ltd). Thermal cycling conditions consisted of an initial denaturation step of 3 min at 95°C followed by 8 cycles each of 30 s at 95°C, 30 s at 55°C and 30 s at 72°C followed by a final extension step of 5 min at 72°C. PCR products were purified using Agencourt AMPure XP PCR Purification beads and quantified using PicoGreen® dsDNA quantification assays (Thermo Fisher Scientific), on a POLAR star Omega (BMG Labtech) plate reader. Nextera XT amplicons were then pooled in equimolar concentration. The length of amplicons was verified on SYBR® Safe DNA stain 2% (w/v) agarose E-gels (Invitrogen). Final quantification of the pooled amplicon library was determined with the NEBNext® Library Quant Kit for Illumina® (New England BioLabs) prior to sequencing on the Illumina MiSeq® platform, using a Miseq® 600 cycle v3 reagent kit.  Sequences have been submitted to the NCBI SRA archive under the accession number PRJNA699859.

Bioinformatic analysis
Sequence reads were processed as detailed in Dumbrell et al. (2016). First quality filtering was carried out with Sickle (Joshi and Fass, 2011) with trimming when the average Q score dropped under Q20 across a sliding window of 35 base pairs. Error correction was carried out in SPAdes (Bankevich et al., 2012) which corrects misidentified bases during the sequencing process by implementing the BayesHammer algorithm (Nikolenko et al., 2013). Paired-end reads were merged into single contigs in PANDAseq (Masella et al., 2012) using the PEAR algorithm (Zhang et al., 2014). Further quality filtering was carried out in MOTHUR (Schloss et al., 2009) to remove homopolymer inserts longer than 8. VSEARCH was used to dereplicate sequences, remove singleton sequences, sort by abundance and cluster sequences around centroids at a 97% similarity threshold (Rognes et al., 2016). Chimeric sequences were removed with UCHIME (Edgar et al., 2011) using both de novo and reference-based chimera detection against the 16S rRNA RDP Release 11.5 sequences. Taxonomy was assigned to the OTU centroids using the RDP Classifier (Wang et al., 2007) for bacteria/archaea and the UNITE database for fungi (Kõljalg et al., 2013) (OTU tables are available in Tables S2, S3, S4). To identify potential pathogenic groups, we used the methods previously described (Ferguson et al., 2021). We compared the OTUs generated to a list of known human pathogens (Kembel et al., 2012). We obtained representative 16S rRNA sequences for each pathogen from the NCBI reference sequence database (RefSeq) and search the representative sequence for each OTU against this database with the Basic Local Alignment Search Tool (BLAST) (Altschul et al., 1990). We defined a pathogenic group as any OTU that shared >99% sequence identity with a strain in the human pathogen reference database.

Statistical analysis
All statistical analyses were carried out in R Studio version 4.0.2. All figures were generated using the 'ggplot2' (Wickham, 2016) and 'cowplot' (Wilke, 2020) packages. Data were first tested for normality using a Shapiro-Wilks test (Shapiro and Wilk, 1965). Normally distributed data were tested for significance using ANOVAs, with p-values adjusted for multiple comparisons using the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995), followed by a Tukey's HSD post hoc test (Tukey, 1953) within the 'agricolae' package (De Mendiburu, 2020). Non-normally distributed data were tested for significance using a Kruskal-Wallis test, with pvalues adjusted for multiple comparisons using Bonferroni correction (Bonferroni, 1936), followed by a Dunn's post hoc test (Dunn, 1964) within the 'FSA' package (Ogle et al., 2020).
The 'Phyloseq' package (McMurdie and Holmes, 2013) was used to rarefy OTUs, transform counts into relative abundances, analyse the taxonomic composition of communities and calculate beta diversity. Only abundant taxa (>1% phyla and >10% genera relative abundance) were used to generate bubble plots. Linear discriminant analysis effect size (LEfSe; LDA score > 2; P-value < 0.05) (Segata et al., 2011) was used to identify OTUs that were differentially abundant between sampling sites.
To evaluate the alpha diversity, Hill numbers (Hill, 1973) were calculated using the 'PhenoFlow' package (Props et al., 2016). Hill numbers are a unified family of diversity indices that compensate for the disproportionate impact of rare taxa by weighting them based on abundance, making them appropriate for analysis of large datasets produced by amplicon sequencing (Vidal-Durà et al., 2018). The degree of weighting is controlled by the order of diversity, q, where increasing q places more weight on the high abundance species and discounts rare species (Chao et al., 2014). Beta diversity was calculated using Bray-Curtis dissimilarity (Bray and Curtis, 1957). The resulting dissimilarity scores were visualised using non-metric multidimensional scaling (NMDS) to observe overall patterns in microbial community structure among the different ponds. Differences in community structure were further tested by analysis of similarities (ANOSIM) (Clarke, 1993) and permutational multivariate analysis of variance (PERMANOVA) (Anderson, 2001), using the 'anosim' and 'adonis' functions in the 'vegan' package (Oksanen et al., 2020), respectively.
Variance Inflation Factor (VIF) values for the chemical variables were calculated in the 'usdm' package (Naimi, 2017). Continuous variables with VIF values greater than 10, and those that were above the multicollinearity threshold of 0.7, were removed. Canonical correspondence analysis (CCA) between community structure and chemical variables was performed in the 'vegan' package (Oksanen et al., 2020). The significance of the CCA model, axes and terms were assessed by a Monte Carlo permutation test (Legendre et al., 2011), using 999 permutations.

Nutrient analysis
Anion and cation concentrations are summarised in Tables 2 and 3, respectively. Generally, Pond 2 had higher concentrations of anions in comparison to the other ponds, whereas Pond 16 had higher concentrations of cations. Sources of nitrogen (e.g. ammonium and nitrate) were either undetectable or low (<1 μmol/l) in Ponds 1 and 9 indicating they may be severely nutrient limited, which will impact microbial growth. No anion or cation concentration exceeded the maximum permissible concentration Table 2 Anion concentrations (μmol/l; means ± SE; n = 3) of water collected from ponds used by Kenyan rural smallholder farmers. Values of concentration that share the same letter (a, b, c) are not significantly different between ponds (P < 0.05; Kruskal-Wallis test). * indicates concentration below the limit of detection. Anions include:  (Ayers and Westcot, 1994). All ponds exceeded the MPC for chloride (0.28 μmol/l), based on local guidelines (National Environment Management Authority, 2006) for irrigation water quality. However, this value is much lower than the internationally accepted standard (9872.23 μmol/l), which no ponds exceed. A Shapiro-Wilk test determined that the chemical data was not normally distributed. With the exception of ammonium, formate and lithium (lithium concentrations below the limit of detection) there were significant differences in ion concentration between ponds (Kruskal-Wallis test, P < 0.05). However, no significant differences between ponds were detected for acetate and phosphate after a post-hoc Dunn's test with Bonferroni adjustment for multiple comparisons. Pond 2 had the highest concentration of chloride (600.35 ± 35.99 μmol/l), nitrate (422.59 ± 24.94 μmol/l), sodium (1678.17 ± 85.04 μmol/l) and sulphate (86.49 ± 3.48 μmol/l). Nitrate concentration had the most variability (coefficient of variation = 385%) driven by the higher concentrations found in Pond 2, which was at least 111-fold greater than any other pond. Pond 8 had the highest concentration of carbonate (429.57 ± 3.95 μmol/l). Pond 16 had the highest concentration of calcium (834.40 ± 30.00 μmol/l), fluoride (60.33 ± 6.84 μmol/l), magnesium (218.82 ± 4.79 μmol/l) and potassium (331.28 ± 6.20 μmol/l). Manganese concentration was very low (<1.13 μmol/l) and was only detected in ponds 1, 10, 11. Lithium was not detected in any of the ponds tested.
At the OTU level fourteen bacterial OTUs had a match of >99% nucleotide identity with a strain in a human pathogen database and were identified as potential pathogens (Table 4). Although overall, potential bacterial pathogens were not abundant, one notable OTU (OTU97) shared 99.5% identity with the rapidly growing non-tuberculous mycobacteria (NTM) Mycobacterium fortuitum and had a significantly higher relative abundance in Pond 1 (LEfSe, LDA = 3.73, P = 3.28 × 10 −3 ) equating to >1% of the total community.
Non-metric multidimensional scaling (NMDS) of Bray-Curtis dissimilarities showed the microbial communities were clearly separated by site with the separation of bacterial communities being the most distinguishable (Fig. 6). ANOSIM confirmed significant separation of the bacterial (R = 0.9981, P < 0.001), archaeal (R = 0.936, P < 0.001) and fungal (R = 0.7732, P < 0.001) communities by site. PERMANOVA also demonstrated bacterial (R 2 = 0.058, P = 0.001), archaeal (R 2 = 0.049, P = 0.006) and fungal communities (R 2 = 0.058, P = 0.009) were significantly differentiated by site. . These represent operational taxonomic unit (OTU) richness (S), exponential of Shannon entropy (Exp(H′)) and inverse Simpson index (1/D), respectively. As q increases, rare OTUs are given less weight and therefore contribute less toward "effective number of OTUs". Hill numbers are shown for the bacterial, archaeal and fungal communities. Different letters (a, b, c, d, e) indicate a significant difference in Hill number between the ponds (P < 0.05, ANOVA followed by a Tukey's HSD post hoc test or a Kruskal-Wallis test followed by Dunn's post hoc test).
Canonical correspondence analysis (CCA) was also performed to discern the possible relationship between community structure in each pond and nutrient concentrations (Fig. 7). Eight environmental variables were included in the CCA biplot, as their variance inflation factors were below 10, and they did not exceed the multicollinearity threshold of 0.7. Acetate, ammonium, carbonate, fluoride, formate, magnesium, phosphate and sulphate concentration were included. The models were significant between bacterial (Monte Carlo test, P = 0.001, 999 permutations), archaeal (P = 0.008) and fungal (P = 0.001) community structures and the chemical variables. For the bacterial community these chemical variables explain 28% of variance and the strongest predictor variables were fluoride and magnesium concentration (P = 0.001). Other significant variables include acetate (P = 0.014), carbonate (P = 0.022) and sulphate (P = 0.040). All significant variables, with the exception of acetate, appear to strongly influence the bacterial community in Pond 16. Acetate influenced the bacterial community in Pond 11. Only 24% of archaeal community variations could be explained by the eight environmental variables. The amount of variability explained by the chemical variables is only significant for the first canonical axis (P = 0.02) and three chemical variables including fluoride (P = 0.008), magnesium (P = 0.001) and sulphate (P = 0.017). The chemical variables explained 25% of fungal community structure variations. As with the bacterial community, fluoride concentration was the strongest predictor variable (P = 0.001), and strongly influenced Pond 16, which has the highest fluoride concentration (60.33 ± 6.84 μmol/l; Table 2). All other variables were significant with the exception of ammonium and formate. 4. Discussion

Microbial community trends
The composition of the bacterial and archaeal communities in the ponds suggest they are either hypoxic or anoxic environments, supporting the growth of anaerobic microorganisms. For example, in the bacterial community OTUs affiliated with the genera Spirochaeta and Opitutus dominated. Members of the genus Spirochaeta are obligate or facultative anaerobes frequently isolated and detected in diverse anoxic environments including anchialine sinkholes (Davis and Garey, 2018), soda lakes (Vavourakis et al., 2018), microbial mats (Spring et al., 2015), marine intertidal muds (Harwood and Canale-Parola, 1983) and marine sediments (Miyazaki et al., 2014). In these anoxic environments Spirochaeta spp. are the trophic intermediate between hydrolytic bacteria and secondary anaerobes, as the main compounds produced by spirochetes are acetate, H 2 and CO 2 , which are normally consumed by methanogens (Berlanga et al., 2008;Blazejak et al., 2005). The genus Opitutus currently only consists of one cultured representative, Opitutus terrae, an obligatory anaerobic member of the phylum Verrucomicrobia (van Passel et al., 2011). Whilst this strain was originally isolated in anoxic rice paddy soil, Opitutus spp. have also been detected in aquatic habitats (Arnds et al., 2010;Bai et al., 2019). The metabolism of Opitutus spp. is suited for growth on plant-derived polysaccharides and for interaction with methanogens with its metabolic by-products (Chin and Janssen, 2002;Janssen, 1998).
With regard to archaeal communities, representatives of the phylum Woesearchaeota and methanogens were dominant. Woesearchaeota mainly grow anaerobically as they lack a complete electron transport chain, a complete TCA cycle and continuous glycolysis, which may explain their ecological distribution pattern of being found in primarily inland anoxic environments (Castelle et al., 2015;Liu et al., 2018). Methanogens are strict anaerobes and most methanogenic niches are in oxygen-free aqueous systems that provide a supply of organic matter (Hoehler et al., 2010). These include aquatic sediments, wetlands, agricultural or natural soils subject to inundation, sewage digesters and anoxic portions of animal digestive tracts (Chaban et al., 2006;Liu and Whitman, 2008). Lineage abundance distribution and cooccurrence network analyses across diverse biotopes confirmed metabolic complementation and revealed a potential syntrophic relationship between Woesearchaeota and methanogens, indicating Woesearchaeota may also impact methanogenesis in inland ecosystems . This relationship, along with support of growth provided by the dominant anaerobic bacteria, may explain the prevalence and dominance of methanogens in our study. Fluoride and magnesium were found to be strong predictor variables for the microbial communities. The occurrence of fluoride in these ponds are likely due to natural processes, such as weathering and dissolution of minerals (García and Borgnino, 2015). Fluoride has been shown to have a strong inhibitory effect on the growth of anaerobic microorganisms (Ji et al., 2016;Ochoa-Herrera et al., 2009). In addition, magnesium can interact with fluoride, forming metal complexes, which also impacts bacterial growth and metabolic activity by inhibiting phosphate transfer enzymes (Baxter et al., 2008). Given that anaerobes are prevalent in this study, distinct communities between ponds could form due to the differences in fluoride and magnesium fluoride complex concentrations.

Prevention of anaerobic conditions in ponds
As the PHSs are openly exposed, large amounts of organic matter can get into them. Decomposition of organic matter exerts an oxygen demand, and accumulation of excess organic matter may lead to dissolved oxygen depletion, promoting anaerobic growth conditions (Dai et al., 2018). Oxygen is an important irrigation water quality parameter that can be the limiting factor in some agricultural systems (Bhattarai et al., 2004(Bhattarai et al., , 2005. Low oxygen in irrigation water can lead to root oxygen deficiency, which in turn causes agronomic problems such as crop stress, slow plant growth and low yields (Bhattarai et al., 2008;Maestre-Valero and Martínez-Alvarez, 2010). Therefore, to overcome this impact on water quality, adjustments can be made to the pond's design preventing oxygen being depleted and hypoxic/anoxic conditions developing. The amount of organic matter going into the pond must be minimised in order to reduce oxygen consumption. Ensure that prior to the pond's construction a location is chosen that reduces this risk. For example, position the pond away from trees to stop leaves falling in, and fence animals to prevent them from defecating in it. A low wall could also be constructed to protect the pond from surface runoff. Once the pond is built, a filtration system could be used to keep debris from entering the pond. The pond could be covered with some form of netting, mesh, silt trap or strainer, positioned either directly on top of or beside it. If material does enter the water, a pond skimmer or fishing net could also be used to remove it.
Another method would be to regularly oxygenate the water. Water aeration is commonly used to increase oxygen concentration in lakes and rivers and increases water quality (Alp and Melching, 2011;Mahmud et al., 2020). There are many different types of aerators that could be used in the pond, such as fine pore aerators (Shammas, 2007), or surface aerators (Rosso et al., 2008). However, venturi aeration may be a simpler and more cost-effective solution for farmers. When a minimal amount of pressure exists between the inlet and outlet sides of a venturi tube, a vacuum occurs. Air that is entrained into the water is instantly forced downstream in the form of small air bubbles and so increases the oxygen levels in the water (Baylar et al., 2010;Baylar and Ozkan, 2006). The reduced cost of purchase and maintenance of venturi aerators compared to others may make them viable alternatives, in low resource areas, such as Africa (Mahmud et al., 2020;Therrien et al., 2019). The use of venturi aeration has successfully been applied to irrigation water for agriculture production and increased yields of the crops (Bagatur, 2014;Dahrazma et al., 2019). Alternatively, if the sole purpose of the water is for irrigation of crops, hydrogen peroxide can be added to the water. Hydrogen peroxide should decompose into water and molecular oxygen within a few hours in natural waters (Spoof et al., 2020). The addition of hydrogen peroxide to irrigation water has been shown to be an effective method to increase oxygen concentration and improve plant parameters in Africa, and other parts of the world (Abd Elhady et al., 2021;Bhattarai et al., 2004). However, if too high concentrations of hydrogen peroxide is used this can lead to oxidative stress on the plants ( Ben-Noah and Friedman, 2016;Gil M et al., 2009), which will reduce overall yield, and may pose a health risk to the farmers. Additionally, the use of water plants that are native to the area could be introduced into the pond to supply oxygen via photosynthesis such as water hyacinths.

Removal of human pathogens in PHS
Although the relative abundance of potential bacterial human health pathogens was low in the ponds, our data indicates a potential human health risk. Specifically, the non-tuberculous mycobacteria (NTM), Mycobacterium fortuitum, was found to be present in one of the ponds, comprising >1% of the bacterial community. In terms of absolute abundance, our results suggest over 36,000 pathogenic Mycobacterium 16S rRNA copies per ml of water. Non-tuberculosis mycobacteria (NTM) are ubiquitous environmental organisms increasingly recognised as important human pathogens (Kasperbauer and De Groote, 2015;Sfeir et al., 2018). M. fortuitum is considered rapidly growing and is distinguished from other NTM by their ability to form colonies in less than one week and their in vitro resistance to antimycobacterials (Daley and Griffith, 2002;Park et al., 2008). It can cause skin disease, osteomyelitis, joint infections, infections of the eye, and pulmonary infections (Griffith et al., 2007;Okamori et al., 2018). Harvested rainwater can be used for many household purposes, such as, drinking, swimming and bathing (Domènech and Saurí, 2011;Hofman-Caris et al., 2019). It is these activities where humans are likely to be exposed to waterborne NTM (Thomson et al., 2013). Furthermore, aerosols generated during these activities can be inhaled potentially resulting in disease (Falkinham, 2003;Lever et al., 2000). Given low doses of Mycobacterium are required to establish an infection (Johnson et al., 2007;Saini et al., 2012) this suggests usage of the harvested water may be a human health risk. Furthermore, NTM are extremely resistant to many chemical disinfectants used to treat water, such as chlorine, chlorine dioxide, monochloramine and ozone (Taylor et al., 2000). Given this resistance to chemical disinfection, propensity to form biofilms (Faria et al., 2015) and ability to survive in low carbon environments (Falkinham, 2003), complete removal of all pathogenic NTM from harvested water will be difficult. Physical methods of water disinfection, such as Pall filtration and UV-C disinfection have proved effective in significantly reducing the viable NTM population in drinking water (Norton et al., 2020). There are a number of commercially available water purification systems which uses these methods and could provide a simple way for individuals to minimise their exposure to waterborne NTM. However, due to the low incomes of the rural smallholder farmers these options may not be feasible without financial support of non-governmental or charitable organisations.

Limitations of the study
A limitation of this study to evaluate potential human health risks is the taxonomic resolution achieved through amplicon sequencing, which can only give identification to the genus level. Furthermore, with amplicon sequencing it remains challenging to differentiate between viable/metabolically active, and non-viable microorganisms (Emerson et al., 2017;Wang et al., 2021). Indeed in this study, although opportunistic pathogenic groups were detected, information is lacking on their viability. However, amplicon sequencing represents the most efficient and cost-effective solution to provide a taxonomic overview and screen for potentially pathogenic groups, compared to PCR-based identification on a pathogen-by-pathogen basis, which would be time-consuming and requires prior knowledge of which groups to target. Shotgun metagenomic sequencing could be an alternative approach, allowing for strain-level identification of pathogens as well as associations between phylogeny and function (Quince et al., 2017). However, shotgun metagenomics has much higher input DNA requirements compared to amplicon sequencing (Sui et al., 2020), which may be difficult to achieve with the filtered harvested rainwater samples used in this study. A further limitation of metagenomics for pathogen detection is that abundant taxa get most of the coverage, meaning rare pathogens could be missed or not sequenced with enough depth for identification (Ferguson et al., 2021).
To fully validate our results, it would be necessary to use a combination of amplicon sequencing, to provide a broad identification of taxa, shotgun metagenomics, to identify strains and virulence genes, and PCR-based methods for specific confirmation of pathogen presence. Once the presence of a pathogen has been confirmed using molecular methods, viability can be determined using for example culture-dependent methods (e.g. plate counts), or alternative techniques such as flow cytometry, or viability dyes coupled with qPCR. This would provide an effective method to fully assess potential human health risks.

Conclusions
In conclusion, this study characterised the microbial ecology of PHSs across taxonomic domains. We identified the presence of potential pathogens, notably, Mycobacterium fortuitum in one of the ponds, which is a human health concern as mycobacteria are rarely targeted in screening of microbial water quality with standard techniques (e.g. heterotrophic plate counts) and would be missed. This study highlights the need for a broader screening of microbial community composition through HTS, to identify all potential pathogens, followed by a more sensitive molecular method (e.g qPCR) to confirm their presence and abundance. The abundance of anaerobes and methanogens, which are driven by fluoride and magnesium concentrations, indicates poor water quality of supplies being used to irrigate crops, which in turn could reduce crop yield and viability. Here we suggest changes to PHS management (e.g. pond design, aeration) to ameliorate hypoxic or anoxic conditions and allow the transport of aerated water to roots, promoting plant water uptake and maximising crop yield. We also suggest physical methods of water disinfection, such as Pall filtration and UV-C disinfection, to mitigate pathogen contamination in the ponds. Overall, this will be beneficial to the livelihoods and welfare of the rural smallholders in Kenya.

Funding
We thank Global Challenges Research Fund 'Investigating the effectiveness of a low technology planter for subsistence farming' (Project A004, GCRF74) and NERC (NE/S005560/1 and NE/S006958/1) for funding this work.
CRediT authorship contribution statement CW and DH conceived and designed the experiment. CW and DH acquired funding. LH carried out the sampling campaign. BG performed the molecular analysis. BG and AB performed the data and statistical analysis. BG wrote the manuscript, supplementary information, and tables. BG and AB produces the figures. All authors reviewed and contributed to the submitted version.

Declaration of competing interest
All authors declare no conflict of interest.