Introduction

Zoonotic pathogens pose an ongoing threat to human society, as acutely highlighted with SARs-COV2, the causative agent of Covid-191. Generally, zoonotic viruses originate in mammalian or avian hosts, with no known zoonotic viruses originating in fish2. However, fish can be sources of a number of bacterial and parasitic diseases3,4. The primary pathogens from fish that pose a risk to humans are bacterial, with pathogens such as Salmonella sp., Campylobacter sp., Escherichia coli, Listeria monocytogenes and Yersinia sp. being responsible for most foodborne outbreaks from fish worldwide5. It is likely also that undescribed zoonotic pathogens are circulating in wild fish, with developing countries in the tropics predicted to be the highest risk areas for future zoonoses to emerge6. Therefore in this paper, a pilot study for an early warning system using shotgun sequencing approach was used to determine whether any pathogens could be identified from wild caught widehead catfish Clarotes laticeps found in the Galana river in South East Kenya. Clarotes laticeps are an important protein source for local communities and are routinely harvested directly from the river. The species is a tropical, freshwater catfish in the family Claroteidae7, which is present in all Nilo-Sudanese basins8 in addition to East Africa and the Nile9.

A number of bacterial pathogens are known to be associated with the consumption of catfish species (both farmed and wild caught), such as Salmonella, Campylobacter, E. coli, L. monocytogenes, Staphylococcus aureus and Vibrio3. Some bacteria, such as Salmonella10,11,12 and Campylobacter13 are thought to be introduced to aquaculture ponds through the faeces of birds and wildlife, and faecal matter from ruminants is known to contaminate ponds with pathogenic E. coli3. Additionally, Salmonella has been detected in wild catfish14 and Grimontia hollisae (formerly known as Vibrio hollisae) was implicated as the cause of septicaemia in a man who had eaten wild caught catfish in the USA15. In Sub-Saharan Africa, wild caught fish have been shown to be sources of bacterial enteropathogens, with one example in Central African Republic specifically linking the presence of Shiga toxin producing E. coli in wild fish and river water to the upstream slaughter of zebu cattle Bos indicus16.

Enterotoxigenic E. coli in addition to Shigella spp. are responsible for the majority of bacterial caused diarrhoeal episodes in children under 5 years in Sub-Saharan Africa17. Similar to E. coli, Shigella spp. are also known to be introduced to water sources from anthropogenic sources such as untreated sewage and agricultural runoff18. Outbreaks of shigellosis due to Shigella spp. from contaminated water are known throughout Sub-Saharan Africa19, in addition to Asia18,20. Additionally the presence of antimicrobial resistance genes has been detected in Shigella spp. in waterways, likely due to pollution in water sources from industry, veterinary medications and human medical treatment18.

The detection of bacterial pathogens, both from fish and other sources, has increasingly been based on the use of polymerase chain reaction (PCR) based approaches over the past two decades21. One significant drawback of this approach is that it can only be used to test for known pathogens, with multiple presence/absence tests needed to check for all potential pathogens. Second and third generation sequencing platforms such as Illumina’s MiSeq and Oxford nanopore’s MinION, however, allow for shotgun sequencing approaches which can be used to sequence all DNA present in a sample, potentially revealing most microbial life present in one test, saving time and allowing for the rapid discovery of novel disease causing pathogens. Third generation nanopore sequencing has been used in a number of shotgun metagenomic studies to identify pathogens, such as diagnosing the cause of infections in orthopaedic implants22, identifying oral bacteriophages23 and characterising the microbiomes of preterm human babies24. The approach has also been used to identify waterborne pathogens, with Reddington et al.25 showing how wastewater influents increased the abundance of Arcobacter when compared to cleaner parts of the Havelse river in Denmark.

In this study, a shotgun transcriptomics approach was used in an attempt to identify pathogens found in wild widehead catfish in the Galana river in South-East Kenya. In addition to shotgun sequencing, the cytochrome oxidase I (COI) gene of the fish was sequenced using Sanger sequencing, in order to confirm the species of catfish found in the river. Although the widehead catfish in the Galana river is currently classified as C. laticeps, there remains taxonomic uncertainty as to whether this population of catfish are in fact C. laticeps26.

Methods

Sample collection and sequencing

Four catfish samples were collected between the 7th and 13th of August, 2019 from three points along the Galana river (Fig. 1). The deceased catfish, which had been caught using a baited hook and line, were gifted by local scouts and samples were taken from remains prior to preparation as food. The catfish sampled were identified as widehead catfish based on morphological characteristics. Fish were dissected in a sterile metal tray, with utensils sterilised by ethanol and flamed between samples. Single muscle and heart samples were taken from each fish, but in the process of dissecting the fish the intestines were significantly nicked, meaning the heart samples were likely contaminated with intestinal material. Tissue samples were placed in RNAlater (Invitrogen, USA) in 2 mL O-ring tubes and kept at ambient temperature in the field for up to eleven days, and later stored at − 80 °C.

Figure 1
figure 1

Locations where samples were collected. (A) Map showing were the fish were collected in South East Kenya. (B) Photograph of one of the widehead catfish provided by the scouts. (C) Map highlighting the location of the region shown in (A) on the African continent. Map generated using QGIS3.6 (https://qgis.org/en/site/forusers/download.html).

The four heart samples were homogenised using a TissueLyser (Qiagen), and RNA was then extracted using an QIAamp Viral RNA Mini Kit (Qiagen) via a QiaCube extraction machine (Qiagen). The RNA was converted to cDNA using Superscript II Reverse Transcriptase (Invitrogen, USA) following the manufacturer’s guidelines using 8 μl of RNA and 1 μl random hexamer primers per reaction.

The concentration of the RNA/DNA samples for four samples was determined using a BioDrop μLite v1.0.4 (Thermo Fisher Scientific) and 7.5 μl of each sample was barcoded using 2.5 μl of Fragmentation Mix RB01-04 from Oxford Nanopore’s Rapid Barcoding Sequencing kit (SQK-RBK004, Oxford Nanopore Technologies). Barcoded samples were then equimolar pooled and sequencing adapters were ligated to the DNA library using 4 μl of Rapid Adapter reagent (RAP, Oxford Nanopore Technologies). The barcoded library was loaded onto a MinION flow cell Mk1 R9.4.1 (Oxford Nanopore Technologies) and run via MinKNOW software v.3.6.5 (Oxford Nanopore Technologies) without real-time base-calling for 68 h on a MinION sequencer (Oxford Nanopore Technologies).

Nanopore data analysis

Guppy v.3.2.10 (Oxford Nanopore Technologies) was used to base-call the output from the MinION sequencing run. Porechop (https://github.com/rrwick/Porechop) was used to remove the adaptor sequences from the MinION sequence data. Porechop was also originally used to sort samples by barcodes, but as 215,028 reads were left unassigned (89.44% of reads) and due to the low read depth it was decided to pool the reads for all four catfish samples for further analysis.

In order to first assess the microbial organisms present, the fastq reads were processed using the What’s In My Pot (WIMP) workflow on Oxford Nanopore’s EPI2ME platform (https://epi2me.nanoporetech.com/, Oxford Nanopore Technologies). Following the WIMP analysis, a reference fasta database was generated using an existing genome for North American yellow catfish Tachysurus fulvidraco (Genbank Assembly Accession: GCA_003724035.1) to act as a proxy for the host genome. Additionally a genome for the parasite Schistosoma haematobium (Genbank Assembly Accession: GCA_000699445.2), the 12,642 existing viral sequences from ray-finned fish Actinopterygii hosts available on the NCBI virus database (https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/) and the genomes for the 23 bacterial and fungal species identified through WIMP (Table 1) were added to the reference fasta file. The fastq reads were mapped against the reference fasta database using the NanoPipe web server (http://www.bioinformatics.uni-muenster.de/tools/nanopipe2/index.hbi?)27. The consensus sequences for mapped reads were then subject to a BLASTn search (https://blast.ncbi.nlm.nih.gov/Blast.cgi), and consensus sequences which did not correspond to any existing sequences were removed from the final result.

Table 1 Microbial genomes included in the reference database.

Sequencing and analysing the COI of the host

A separate DNA extraction was performed on the catfish muscle samples collected, with samples being incubated at 56 °C for 2 h in a mix of 12 µl of proteinase K (20 mg/ml) and 400 µl of 10% Chelex solution, followed by 15 min at 99 °C. Samples were then centrifuged for 1 min at max speed (20,817 G) and 150 µl of DNA supernatant was placed in a new 1.5 mL Eppendorf tube. A 653 bp amplicon of the COI gene was amplified using the primer pair FishF2 (5’- TGT AAA ACG ACG GCC AGT CGA CTA ATC ATA AAG ATA TCG GCA C -3’)28 and FishR2 (5’- CAG GAA ACA GCT ATG ACA CTT CAG GGT GAC CGA AGA ATC AGA A -3’)28. A 25 µl master mix was prepared in a UV-sterilized hood consisting of 3.125 µl of Buffer (Kapa Biosystems), 1.25 µl of 10 mM dNTPs (Invitrogen), 1.25 µl of each primer (Integrated DNA Technologies), 0.125 µl Taq polymerase (Kapa Biosystems), 2.5 µl of Bovine Serum Albumin (BSA)(20 mg/ml)(Thermo Scientific), 14.5 µl of dd H2O and 1 µl of DNA. PCR conditions were as follows: initiation at 95 °C for 5 min, followed by 35 cycles of 95 °C for 30 s, 50 °C for 1 min and 72 °C for 1 min, with a final extension for 10 min at 72 °C. Samples which showed a clear band on a safeview stained agarose gel after electrophoresis were then subject to enzymatic clean-up prior to sequencing using a mixture of 0.05 µl ExoI, 0.1 µl TSAP and 4.85 µl H2O, added to 20 µl of PCR product and then incubated at 37 °C for 15 min followed by 80 °C for 15 min.

Samples were subsequently sent for commercial Sanger sequencing (Macrogen). Forward and reverse strands were aligned using Geneious version 10.2.329, and the consensus sequences were initially analysed using BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Following this the corresponding aligned sequence from the top 100 BLAST hits were downloaded in fasta format and added to an alignment containing the four sequences for the Galana widehead catfish. The alignment file was then uploaded to the IQ-TREE web server for model selection (http://iqtree.cibiv.univie.ac.at/),30 and the web server of the RAxML program (https://raxml-ng.vital-it.ch)31 was used to generate an ML tree, with 100 bootstrap replicates performed.

Results

MinION sequence analysis

Following adaptor removal, 240,429 reads remained, of which 68,908 were successfully mapped to one of the reference sequences using NanoPipe27. Of the 68,908 mapped reads, 59,849 mapped to the proxy host (Fig. 2), 9031 mapped to bacterial sequences, 22 mapped to fungal sequences, and 6 mapped to viral sequences. The majority of the 9031 bacterial reads (99.06%, 8,938 reads) (Fig. 2) mapped to the genome of Shigella flexneri, specifically to a predicted response regulator (Genbank Accession: CP045522.1, position 4,265,154—4,265,537) which shares protein homology with the two component response regulator gene DpiA in Shigella boydii32. Further investigation of the viral and fungal mapped reads revealed these to be ambiguous, and no bacteriophages were detected.

Figure 2
figure 2

Taxon of the 59,316 reads which were successfully mapped. Host RNA is based on reads that mapped to the genome of yellow catfish. The species breakdown of the reads which mapped to bacterial sequences is also given.

Host COI analysis

Clean, unambiguous Sanger sequences were generated for all four widehead catfish samples, with all samples sharing the same haplotype. A BLAST search of the consensus sequence revealed the closest match to be a sequence for Bathybagrus tetranema (Accession number: HG803463)33 at 93.75% similarity. The consensus sequence matched 91.89% to an existing Clarotes laticeps on Genbank (Accession number: HG803491)33.

The maximum likelihood tree (Fig. 3) showed that whilst the host catfish sequence is not closely related to any existing catfish sequences, it does cluster within the family Claroteidae. The Galana widehead catfish show a deep split from the widehead catfish sequenced from Nigeria (Fig. 3), and show a basal split near the base of the Claroteidae cluster.

Figure 3
figure 3

The ML tree generated using the catfish COI sequences. Highlighted in red are the COI sequences of the widehead catfish collected from the Galana river. Highlighted in green are COI sequences corresponding to widehead catfish collected in the Amambra river, Nigeria.

Discussion

Metagenomic approaches using second generation sequencing platforms have been used previously to identify foodborne pathogens directly from a source animal34, however it would appear that this study represents the first time a metagenomic approach using nanopore sequencing has been used to detect a foodborne bacterial pathogen directly from an animal. Previous studies have shown the effectiveness of MinION’s nanopore sequencer as a tool for sequencing foodborne bacterial pathogens35,36,37, but these studies have relied on bacteria that were isolated from a food source and then cultured prior to sequencing. It is estimated that approximately 99% of prokaryotes are unable to be cultured in the laboratory with currently available methods38, which highlights the importance of developing diagnostic workflows which are culture independent, such as the approach outlined in this study.

The identification of the presence of Shigella flexneri in the catfish in this study highlights the potential of nanopore metagenomics as a relatively simple and effective way to detect human pathogens in one test. The causative agent of human shigellosis, Shigella is a genus of gram-negative bacteria that cause diarrhoea and dysentery, being a major cause of moderate-to-severe diarrhoea in sub-Saharan Africa39. Annually, Shigella is estimated to cause 600,000 deaths from 80 to 165 million cases worldwide40. In the developing world, S. flexneri is the predominant cause of shigellosis, with S. sonnei being the predominant strain in industrialised countries40. The faecal-oral route is the primary way by which Shigella spreads, with transmission also documented via contaminated food, drinking water and flies41. The S. flexneri gene that the majority of the reads mapped to, DpiA, has been shown to interfere with plasmid maintenance when overexpressed, inducing the SOS response42, which is a bacterial response that promotes dormancy in unfavourable environmental conditions43. The overabundance of the DpiA gene could imply that the bacteria were in a dormant state, but as other genes known to be involved in the SOS response were not detected with this approach, this cannot be confirmed. It may be the case that DpiA is expressed at much higher levels than the other genes in the SOS response, and a deeper sequencing effort might reveal further genes being expressed. Additionally the overexpression of DpiA has been shown to be part of the bacterial defence following exposure to β-lactam antibiotics in E. coli44, which could imply the presence of antimicrobial resistance as seen with Shigella spp. in other water sources18.

It is unclear how the catfish became carriers of S. flexneri, so it can only be speculated how the bacteria may have been introduced to the fish based on the known biology of the pathogen. Although the Shigella RNA was extracted from the heart tissue, we believe in fact it was present in the intestinal tract of the catfish, as whilst the fish were dissected in a sterilised tray, clean gloves were worn and utensils were flamed before each dissection, in the process of initially opening the fish the intestines were significantly nicked, meaning the heart samples were likely contaminated with DNA from material within the intestines. The primary hosts of S. flexneri are humans and primates45, but it has also been detected in rabbits46, cattle47, pigs48 and chicken45. Previous studies have also shown that bacterial pathogens can be introduced to fish in water bodies via the faeces of cattle, birds and wild animals3,10,11,12,13,16. This could imply that S. flexneri was introduced to the river either via the large herds of cattle which are brought to the river to drink, or through the various wild animals that use the river, with species such as hippopotamus and elephant known to be sources of other bacterial diseases such as anthrax, brucellosis, tetanus and salmonellosis (hippopotamus)49 and tuberculosis (elephant)50, and subsequently ingested in sediment by the bottom feeding catfish. In addition to the potential animal sources listed, a number of villages located approximately 150 km upstream of the sampling site, on the western side of Tsavo East National Park, are suspected to be primary sources of untreated sewage into the Galana river (John Byrne, University College Dublin, pers. comm.).

The sequence analysis of the host catfish COI also revealed that it was not the species Clarotes laticeps as suggested based on morphology, but the phylogeny does suggest the fish belongs in the family Claroteidae. The species C. laticeps is found naturally in West Africa, the Nilo-Sudanese basin and East Africa8,26,51,52, and the other C. laticeps sequence included in the analysis (Fig. 3) was collected in the Amambra river in Nigeria53 which is ~ 4000 km away from the Galana river, and the two rivers are separated by at least two major watersheds. According to both Okeyo54 and Seegers et al.26, and based on morphology, the only member of the family Claroteidae found in the Galana river is C. laticeps, however Seegers et al.26 added the caveat “the taxonomic status of the Kenyan populations is uncertain and needs detailed study”. This would suggest that the fish examined in this study were in fact the fish that to date have been classified as C. laticeps based on morphology, but now the genetic data have revealed that this population of catfish are likely a cryptic species. Based on these findings we tentatively suggest a new species name Clarotes kambare, or Kenyan widehead catfish, with “kambare” being the Swahili word for catfish, although further in-depth morphological and genetic work will be needed to clarify the unique taxonomic status of this population of fish. Further studies are also required to determine the full geographic location of this species and its conservation status.

In conclusion, this study has highlighted how a shotgun metatranscriptomic approach can be used to identify human pathogens in wild caught fish, and how it may highlight the transmission potential of enteropathogens in non-host animals. Further research is needed going forward to explore the potential benefits of this approach over the presence/absence results of conventional PCR assays or 16S sequencing, as while we did highlight the presence of one gene involved in the bacterial SOS response, there are multiple genes involved which were not detected. Furthermore, the potential identification of the host as a cryptic species highlights the need for further populations of wild harvested fish to be characterised genetically, as unknowingly managing a species complex as one species may lead to severe threats to local cryptic species, in addition to masking potential disease risks in the cryptic species.