Prokaryotic diversity and potentially pathogenic bacteria in vended foods and environmental samples

Ready-to-eat fast food vending outlets provide a cheap and readily available food. Foodborne diseases have been previously reported in Embu, Kenya, but data on the prokaryotic metagenome in vended foods is scanty. This study aimed to determine the prokaryotic diversity in fruits, vegetable salad, African sausage, chips (potato fries), fried fish, roasted beef (meat), smokies, samosa, soil, and water collected from food vendors and the surrounding environment in Embu Town and Kangaru Market. The study used 454 pyrosequencing, Illumina high-throughput sequencing of 16S rRNA gene in the analysis of total community DNA extracted from samples using the phenol-chloroform method. The 16S rRNA gene variable region (V4-V7) of the extracted DNA was amplified and library construction performed. Sequence analysis was done using QIIME2. Hierarchical clustering of samples, diversity indices, rarefaction curves, and Venn diagrams were generated using the R programming language in R software version 3.6.3. Bacterial operational taxonomic units (OUTs) were distributed in Proteobacteria (52.81%), Firmicutes (31.16%), and Lentisphaerae (0.001%). The OTUs among archaea were Candidatus Nitrososphaera (63.56%) and Nitrososphaera spp. (8.77%). Brucella spp. and Bacillus cereus associated with foodborne diseases were detected. Potential pathogens, Rickettsia spp. in risk group 2 and Brucella spp. in risk group 3, were detected. Uncultured Candidatus Koribacter and Candidatus Solibacter were also detected in the food samples. There was a significant difference in the microbial community structure among the sample types (P<0.1). The results demonstrated the presence of some prokaryotes that are associated with food spoilage or foodborne diseases in vended foods and environmental samples. This study also detected uncultured prokaryotes. The presence of potential pathogens calls for stringent hygiene measures in food vending operations.


Introduction
Foodborne diseases associated with microbial pathogens, their toxins, or metabolites are a serious global public health problem. Approximately 600 million cases of foodborne diseases and 420,000 deaths occur each year (WHO 2020). Therefore, the assessment of food matrices for contamination by microbial organisms is important. Conventional methods that have been used in detecting foodborne pathogens rely on growing these microorganisms on synthetic media. These methods are time-consuming and sometimes give prejudiced results as most of the uncultured microorganisms are not detected. Thus, analysis by conventional methods does not give a true representation of the microbial community including potential pathogens in the environment (Law et al. 2015). Molecular methods such as polymerase chain reaction (PCR), denaturing gel electrophoresis (DGGE), quantitative PCR (qPCR), and loop-mediated isothermal amplification (LAMP) are more reliable in detecting both cultured and uncultured microorganisms in food (Mayo et al. 2014).
Direct metagenome sequencing complements rRNA gene-based characterization by providing information on physiological potential of the microorganisms. Investigation of foodborne pathogens, their source, and associated risks can be done using metagenomics which provides an exhaustive sequencing depth that represents the broader microbial community in the environment (Kergourlay et al. 2015). This technology has proved to be useful in the detection of microorganisms in food and has been previously used to detect Firmicutes in tomatoes (Ottesen et al. 2019) and extremely halophilic archaea in food-grade salts (Henriet et al. 2014). Metagenomics identified watercress as the main source of STEC O157 in two concurrent foodborne disease outbreaks (Jenkins et al. 2015). Studies based on nextgeneration sequencing detected pathogens in Kimchi (a type of fermented food) and in Chinese cabbage (Jung et al. 2011;Kim et al. 2018). Thus, sequencing is important in detecting and characterizing foodborne pathogens (Djenane et al. 2014).
Foodborne disease associated with microbial contamination of food and water in Embu, Kenya, has been previously reported. In 2016, 234 individuals in Embu County who suffered from food poisoning were diagnosed with cholera (George 2016) but there was no application of sequencing technologies in the detection of the causative agent. Whereas metagenomic analysis has been used to investigate prokaryotic diversity in environmental samples such as soil in Kenya (Karanja et al. 2020), application in food safety has not been fully explored. Thus, there could be unculturable potential pathogens that contribute to foodborne diseases in Embu County. Next-generation sequencing (NGS) platform permits the identification and the characterization of the microbial community at a depth of up to millions of sequences per sample, thus allowing the detection and identification of microorganisms that are present in low numbers (Kumaraswamy et al. 2014;Harb and Hong 2017). Metagenomic analysis which involves direct isolation of nucleic acids from samples could generate important information that can be useful in predicting the dynamics of microbial food contamination. Therefore, this study used Illumina high-throughput sequencing to investigate prokaryotic diversity in vended foods, water, and soil from Embu Town and the nearby Kangaru Market. The study was able to determine prokaryotic diversity and detect potentially pathogenic bacteria as well as uncultured prokaryotes in vended foods, water, and soil. Total DNA extraction protocol from fast food and environmental samples

Materials and methods
Total DNA was extracted from soil, water, and the selected food samples in duplicate as described by Sambrook et al. (1989) with some modifications. The chemicals that were used in the extraction of total DNA from the samples were purchased from the Inqaba Biotech East Africa Ltd. (IBEA). From each soil sample, 0.5 g of soil was weighed, placed into 2 ml Eppendorf tube, and suspended in 1 ml of sterile water. From the prepared food inoculum, 1 ml was dispensed in 2 ml Eppendorf tubes. The samples were centrifuged at 13, 200 rpm for 10 min, and the supernatant was discarded. All samples formed a visible pellet upon centrifugation. Each pellet was re-suspended in 500 μl of solution A containing 100mM Tris-HCl (pH 8.0) and 100 mM EDTA (pH 8.0), mixed by vortexing and then centrifuged for 1 min after which the supernatant was discarded. The pellet was re-suspended in 200 μl of solution A (100mM Tris-HCl (pH 8.0), 100 mM EDTA (pH 8.0), and 5 μl of lysozyme from a 20 mg/ml solution) and incubated at 37°C for 30 min in a water bath. A lysis buffer of 600 μl (400 mM Tris-HCl (pH 8.0), 60 mM EDTA (pH 8.0), 150 mM NaCl, 1% sodium dodecyl sulfate) was added, and the tubes were allowed to stand for 10 min. A proteinase K of 10 μl (20 mg/ml) was added, and the mixtures were then incubated at 65°C for 55 min in a water bath. An equal volume of chloroform isoamyl alcohol was added to the mixtures which were then centrifuged at 13,200 rpm for 5 min at 4°C. The supernatant was transferred into new eppendorf tubes. Sodium acetate 150 μl (pH 5.2) and an equal volume of the mixture (supernatant + sodium acetate) of isopropyl alcohol was added, mixed by inverting gently, and incubated overnight at −20°C. The tubes were centrifuged at 13,200 rpm for 10 min after which the supernatant was discarded. The resultant DNA pellets were washed in 300 μl of 70% ethanol. The pellets were centrifuged at 13,200 rpm for 1 min and the supernatant discarded. The DNA pellets were air-dried, lyophilized, and stored at −20°C.

Amplicon library preparation
PCR amplification of the 16S r RNA gene V4-V7 variable regions was carried out from the extracted DNA using barcoded primers 515F (GTGCCAGCMGCC GCGGTAA) and 806R (GGACTACHVGGGTWTCTA AT) by the service provider mrdnalab in USA (www. mrdnalab.com Shallowater, TX, USA) as described by Caporaso et al. (2011). The amplification was done in 30 cycles using HotStarTaq Plus Master Mix Kit (Qiagen, USA) under the following conditions; 94°C for 3 min of initial heating, followed by 30 cycles of 94°C for 30 s, 53°C for 40 s, and 72°C for 1 min, after which a final elongation step at 72°C for 5 min was performed. The quality of the PCR products was assessed on 2% agarose gel to determine the success of amplification and the relative intensity of the bands. Purified PCR products were used to prepare the DNA library by following the Illumina TruSeq DNA library preparation protocol (Xia et al. 2014). Sequencing was performed at Molecular Research DNA (www.mrdalab.com, Shallowater, TX, USA) on a MiSeq 2x300bp Version 3 following the manufacturer's guidelines.

Sequence analysis, taxonomic classification, and data submission
Sequences obtained from the Illumina sequencing platform were depleted of barcodes and primers using a proprietary pipeline (www.mrdnalab.com, MR DNA, Shallowater, TX) developed at the service provider's laboratory. The quality control of sequences was done by trimming and filtering the sequences based on their quality score, followed by clustering them into operational taxonomic units (OTUs) based on a fixed dissimilarity threshold. Microbiome bioinformatics was performed with QIIME 2 2019.410 (Bolyen et al. 2018). Raw sequence data were demultiplexed, and the quality was filtered using the q2-demux plugin followed by denoising with DADA2 (Callahan et al. 2016) via q2-dada2. All amplicon sequence variants (ASVs) were aligned with mafft via q2-phylogeny (Katoh et al. 2002) and used to construct a phylogeny with fasttree2 (Price et al. 2010). Alpha-diversity metrics, observed OTUs and Faith's Phylogenetic Diversity beta (Faith 1992), diversity metrics (weighted UniFrac) (Lozupone et al. 2007), unweighted UniFrac (Lozupone and Knight 2005), Jaccard distance, Bray-Curtis dissimilarity, and principal coordinate analysis (PCoA) were estimated using q2-diversity after samples were rarefied (subsampled without replacement) to 900 sequences per sample. To assign Amplicon Sequence variants (ASVs), q2-featureclassifier and a QIIME 2 plugin for taxonomy classification of marker-gene sequences were used (Bokulich et al., 2018a, b). The q2-feature-classifier plugin supports use of any of the numerous machine-learning classifiers available in scikit-learn for marker gene taxonomy classification and currently provides two alignment-based taxonomy consensus classifiers based on BLAST and VSEARCH. The sequences were classified using the Greengenes 99% OTUs 16S reference sequences for bacterial and fungal communities. The change in direction and magnitude in the first principal co-ordinate axis (PC1) for each sample was computed using q2-longitudinal (McDonald et al. 2012). The sequences were submitted to the National Center for Biotechnology Information (NCBI), U.S.A Sequence Read Archive and were assigned accession number PRJNA669559. The data can be accessed through this link, https://www.ncbi.nlm.nih.gov/sra/PRJNA669559.

Data analyses
Diversity indices (Richness, Shannon, Absolute diversity), rarefaction curves, and Venn diagram that compared the shared OTUs between the vended food and the environmental samples were determined from the resulting OTUs using R packages, namely Vegan, Phyloseq, and Ampvis2 R packages versions 2.5.6, 1.30.0, and 2.6.0, respectively, in R version 3.6.3 (2020-02-29). Hundred interaction of rarefaction was computed for each sample to 20,000 sequences using QIIME2 pipeline version qiime2-2019.10. Chao1, a non-parametric estimation of OTU richness, was used to compare species richness between the data sets and sample types. Vended food and the environmental samples were compared using the Analysis of Similarity (ANOSIM) test, based on Bray-Curtis distance measurements with 999 permutations. Significance was tested at 95% confidence interval (p = 0.05). Hierarchical clustering of the samples based on Bray-Curtis dissimilarity was carried out using the R programming language and the Vegan package. Correlation analysis, based on Pearson's correlation coefficient, between the vended food and the structure of the environmental sample was conducted, and significance was tested using the Mantel test in R programming language (C Team 2011). Data analysis 3D PCoA plots were calculated using unweighted UniFrac and Bray-Curtis, which was used to determine the distance between the biological communities. To support OTU-based analysis, taxonomic groups were derived from the number of reads assigned to each taxon at all ranks from domain to species using the exported OTUs output from QIIME2 pipeline Version 2019.10. Barplots were generated using R-package phyloseq software version 1.30.0. Alpha diversity was calculated using R-package Phyloseq ggplot2 version 1.30.0 and version 3.3.2. Beta diversity was determined using R-package Vegan, version 2.5.7. Heat maps were generated using R-package Phyloseq using version 1.30.0. Venn diagrams were generated using R package version 1.6.20.

Research approval
A license to conduct the research was obtained from the National Commission for Science, Technology & Innovation (NACOSTI) in Kenya. Authorization to collect samples was granted by the Public Health Ministry of Embu County.

Results
Taxonomic assignment of prokaryotic sequences from vended foods and environmental samples The sequence reads of length >250 bp from Illumina sequencing libraries ranged between 210,536 and 539, 981 sequences from both vended food and environmental samples. At the phylum level, the sequence reads contained between 0 and 27,930 OTUs as revealed by the change in the metric of the rarefaction curve. The soil samples had a higher number of observed OTUs compared to food samples while samosa had the least number of observed OTUs (Fig. 1a, b). The samples from Kangaru Market recorded a higher number of OTUs compared to samples from Embu Town.

Bacterial and archaeal taxonomic composition of the vended foods and environmental samples
The OTUs in food and environmental samples were distributed among twenty-three bacterial phyla. The most abundant phyla were Proteobacteria (52.81%), Firmicutes (31.16%), and Bacteroidetes (8.00%) as shown in Fig. 2a.
The most predominant genus in soil was Yersinia.
Among the food samples, the fish had the highest diversity of microorganisms followed by smokies (Fig. 2b).
Clostridiaceae was most predominant family in fish samples followed by the Pseudomonaceae. Leuconostocaceae was most predominant in fruits. Aquaspirillaceae and the Enterobacteriaceae were the most predominant families in roasted beef (Fig. 2c, d) (Fig. 4). Chips and smokies samples from Embu Town had similar bacterial diversity (Fig. 5a). The genus Stenotrophomonas was predominantly detected in smokies from Embu Town while Citrobacter was predominantly detected in African sausages and fruits from Embu Town (Fig. 5b). Soil samples from both Kangaru Market and Embu Town had similar archaeal diversity. The most abundant archaeal group was Candidatus (74.86%), and it was predominantly recovered from Embu Town soil (Fig. 5c).

Comparison of prokaryotic diversity in vended foods and environmental food samples
Prokaryotic diversity was examined within individual samples and between different samples. From the results output, Chao1 indices showed species richness of data sets while Shannon and inverted Simpson indices reflected the diversity of OTUs in samples. The soil had the highest number of observed OTUs while fish had the highest amount of prokaryotic diversity compared to the rest of the food samples (Fig. 6). Different samples shared common OTUs as illustrated by Venn diagrams.
There was an overlap of OTUs between vegetable salad from both Kangaru Market and Embu Town. An overlap of OTUs was also observed in chips from both sites. African sausages from the two sampling sites also exhibited overlapping OTUs (Fig. 7).

Potentially pathogenic bacteria from vended foods and environmental samples
Illumina sequencing detected potentially pathogenic bacterial 16S rRNA sequences belonging to Proteobacteria and Firmicutes (Table 1). The genera Legionella, Aeromonas, Staphylococcus, and Erysipelothrix occurred in almost all the samples. A relatively high abundance of the potentially pathogenic bacteria was recorded in the fruit sample from the Kangaru Market at 4.8%. Embu Town samples had a lower relative abundance of sequences belonging to potentially pathogenic bacteria with the soil having 1.43% abundance. Overall, 28 species sequences belonging to 16 genera considered to be potentially pathogenic risk group 2 organisms were detected. Legionella spp. were most abundant in water from Kangaru Market. Sequences for Brucella spp. and Coxiella spp. which are risk group 3 organisms were also detected. The Brucella spp. sequences were detected in soil from Embu Town, whereas Coxiella spp. were detected in fruit, water, and soil from Kangaru Market (Table 2).

Discussion and conclusion
Vended foods and environmental samples investigated in the present study were diverse in their microbial community composition. The microbial communities varied depending on the sample type and the site where the samples were collected from. Bacteria formed the most (over 99%) abundant taxa in all the vended foods and environmental samples. This study determined that the most abundant bacteria phyla were Proteobacteria (52.81%), Firmicutes (31.16%), and Bacteroidetes (8.00%). Previous studies have reported the abundance of these phyla in soil and food (Gangwar et al. 2009). Proteobacteria pathogens have been detected in drinking water and were associated with biofilm formation in water pipes and leakages (Richards et al. 2018). Firmicutes have been reported to adapt to solid food which increases their activity; thus, they can thrive in many solid foods (Hugenholtz et al. 2017). This was evident in the present study, where they thrived in potato fries. Sequences affiliated to archaeal phyla; Euryarchaeota (3.36%) and Thaumarchaeota (96.64%) were detected in this study. The archaea orders Methanoarcinales, Nitrosopumilales, Nitrososphaerales, Methanomicrobiales, and Methanobacteriales were recovered from vended food and soil samples. The order Nitrososphaerales is known to inhabit terrestrial ecosystems, and these prokaryotes must have been introduced in the soil following surface water runoff and found their way into food following its mishandling (Kerou and Schleper, 2015a, b;Cavicchioli et al. 2003). Methanomicrobiales have been shown to inhabit a broad range of anoxic environments (Browne et al. 2017). Methanobacteriales prefer anaerobic environments such as marine and freshwater, animal gastrointestinal tracts, and geothermal habitats. The presence of this order in the soil explains the possibility of the introduction of the prokaryote by surface water runoff from animal waste from the surrounding area (Liu 2010). Prokaryotic diversity varied between sampling sites and among the food samples. Overall, Kangaru Market recorded a higher number of observed OTUs compared to Embu Town, though some food samples from Embu Town had a high number of observed OTUs like fried fish from Embu Town. More variation of prokaryotic diversity was observed in fruits from Kangaru Market compared to those from Embu Town. The variation in the distribution of these species between the sites and the different samples could be due to environmental factors or possibility of careless handling of food by the vendors especially in Kangaru Market. Fried fish samples had the highest diversity of prokaryotes compared to the rest of the food samples. In addition, fried fish from Embu Town had higher diversity compared to fried fish from Kangaru Market. Bacterial contamination of fried fish in the present study could be due to crosscontamination during handling of raw and fried fish, as some vendors use the same hand to handle both (Sifuna and Onyango 2018). Contamination of fried fish could also arise from incomplete cooking; thus, the fish get contaminated by the food they consume or contaminated by the water they live in. Contaminants may also originate from anthropogenic activities and from surviving the cooking process (Donde et al. 2015).
Contamination of street vended smokies which are factory-made could be attributed to the crosscontamination by "kachumbari" (a complementary vegetable salad) or unhygienic handling (Kariuki et al. 2017). The presence of potentially pathogenic and nonpathogenic prokaryotes in the African sausage has been attributed to cross-contamination in previous studies (Karoki et al. 2018).
The most abundant bacteria recovered across all samples were Alcaligens feacalis, Lactobacillus perolens, Pseudomonas spp., Citrobacter freundii, Clostridium spp., and Acetobacter spp. Many of the detected bacteria are either normal flora, spoilage microorganisms, or potential pathogens and thus might predispose the public to food poisoning. Multidrug resistance has been reported to occur in Citrobacter freundii; thus, its presence in food should inform policy regarding proper handling of food to avoid cases of antibiotic resistance arising from consumption of food contaminated with these strains in the future (Liu et al. 2018). Sequences of Bacillus cereus were recovered from the soil in this study. It has been previously been detected in food and has been associated with food poisoning following the formation of endospores and toxin production (Griffiths and Schraft 2017). Hafnia spp. sequences were detected in all samples. Commensal Hafnia spp. have been shown to reduce food intake and fat mass in people with obese conditions (Legrand et al. 2020). The presence of this bacterium in food could be of benefit to man. Pseudomonas spp. were detected in all samples, some Pseudomonas spp strains, have been associated with food spoilage while others are pathogenic and have been associated with cases of foodborne disease outbreaks (Fakhkhari et al. 2020). The present study detected Brucella spp., which belongs to risk group 3 in the soil. Some strains of Brucella spp. have been previously associated with food poisoning; thus, the public needs to be informed to observe proper hygiene measures so as to avoid the introduction of the bacterium in food (Garcell et al. 2016). Clostridium spp. sequences were recovered from fried fish samples from Kangaru Market. Some Clostridium strains are toxigenic and have been associated with cases of foodborne disease outbreaks (Rodriguez et al. 2016). Coxiella spp., which belong to risk group 3 were detected in the present study. Some species in this genus have been associated with dermatitis outbreaks in Italy (Raele et al. 2018). Staphylococcus pasteuri, which belongs to risk group 2 was detected in almost all food samples. Its presence in food has been associated with bacteremia in patients diagnosed with acute leukemia, catheter-associated urinary tract infection, and endocarditis (Ramnarain et al. 2019). The presence of S. pasteuri in almost all food samples is therefore of concern. Legionella pneumophilia a clinically important pathogen that was detected in most of the food samples in this study has been associated with severe pneumonia legionnaires disease. It has been previously detected in freshwater where it freely replicates in protozoa (Mendis et al. 2015). In the present study, it was isolated from soil and food samples to which it could have been introduced from the environment. Most of the prokaryotes in the present study were detected from the soil. This is expected since the soil is known to harbour millions of microorganisms as previously revealed by next-generation sequencing (Vestergaard et al. 2017). In general, fruits from the Kangaru Market with 260 OTUs and vegetable salads from Embu Town with 200 OTUs had the highest number of observed OTUs among the food samples. Fresh produce such as fruits and vegetables are often exposed to contamination by microbial communities as a result of Fig. 6 a Alpha diversity of prokaryotes in vended food and environmental samples from Embu Town and Kangaru Market. b Alpha diversity, measured by the observed taxa and Shannon diversity Index for food and environmental samples from Kangaru Market and Embu Town contact with irrigation water, manure, or soil. Microorganisms such as Lactobacillus pentosaceus, Weissella cibaria, and Lactobacillus plantarum have been previously detected in vegetables, and some of these bacteria have been associated with food spoilage (Peng et al. 2018).
Prokaryotes that have not yet been cultured were detected in this study and included; Candidatus Solibacter, Candidatus Microthrix, Candidatus Koribacter, Candidatus Protochlamydia, and Candidatus Kuenenia. Most of these prokaryotes were detected from soil samples; however, Candidatus Koribacter, Candidatus Microthrix,   (George 2016). The present study did not detect V. cholera sequences in soil, water, or food samples. Some of the microorganisms that were detected are associated with food spoilage or foodborne diseases. S. pasteuri and R. canadensis, which belong to risk group 2, and Brucella spp. which belong to risk group 3 were detected in the study. The isolation and characterization of uncultured prokaryotes are important so as to verify whether the strains in the present study could be of public health concern. The development of culture techniques for these prokaryotes will facilitate isolation and understanding of their biology. There was a lot of similarity in the diversity of microorganisms between the environmental and food samples; thus, crosscontamination cannot be ruled out. Data from the current study indicates the need for proper handling of food, to avoid cases of food poisoning and associated disease outbreaks.