The usefulness of nanopore sequencing in whole-genome sequencing-based genotyping of Listeria monocytogenes and Salmonella enterica serovar Enteritidis

ABSTRACT Bacterial genotyping through whole-genome sequencing plays a crucial role in disease surveillance and outbreak investigations in public health laboratories. This study assessed the effectiveness of Oxford Nanopore Technologies (ONT) sequencing in the genotyping of Listeria monocytogenes and Salmonella enterica serovar Enteritidis. Our results indicated that ONT sequences, generated with the R10.4.1 flow cell and basecalled using the Dorado 0.5.0 Super Accurate 4.3 model, exhibited comparable accuracy to Illumina sequences, effectively discriminating among bacterial strains from outbreaks. These findings suggest that ONT sequencing has the potential to be a promising tool for rapid whole-genome sequencing of bacterial pathogens in public health laboratories for epidemiological investigations. IMPORTANCE This study unveils that Oxford Nanopore Technologies sequencing, by itself, holds the potential to serve as a whole-genome sequencing-based genotyping tool in public health laboratories, enabling routine subtyping of bacterial isolates for disease surveillance and outbreak investigations

utility in molecular subtyping of bacterial isolates for disease surveillance and outbreak investigation, where sequence accuracy is exceptionally desired.
Studies have suggested that base modifications can contribute significantly to base calling errors in ONT sequencing (9,10).Recent developments by ONT, including the introduction of Flowcells (R10.4.1) and Chemistry (SQK-NBD114.24), have demonstrated raw read accuracy exceeding 99.1% (11).Despite this progress, challenges in prokary otic organism sequencing for bacterial genotyping were noted in a study by Lohde et al. (9).In this study, the researchers indicated that the accuracy of sequences generated using ONT R10.4.1 flow cells and refined with tools available at the time is inadequate for bacteria isolate genotyping in outbreak tracing.Nevertheless, our current study illustrates that ONT sequencing, employing R10.4.1 flow cells and the Dorado 0.5.0Super Accurate (SUP) 4.3 model, yields sequences with accuracy comparable to Illumina sequencing in the core genome multilocus sequence typing (cgMLST) and whole-genome single nucleotide polymorphism (wgSNP) analysis of Listeria monocyto genes isolates and Salmonella enterica serovar Enteritidis isolates from outbreaks.

Bacterial isolates
Twelve L. monocytogenes and 23 S. Enteritidis isolates were included in this study.L. monocytogenes isolates were recovered from sporadic listeriosis cases in hospitals in Taiwan between 2019 and 2020.The L. monocytogenes isolates belonged to eight sequence types, including ST1 ST5, ST87, ST101, ST155, ST378, ST1081, and ST1532 (see Table S1).Our previous study indicates that 5 of the 12 isolates have numerous base modification-mediated errors in the sequences generated using ONT R9.4 flow cells and the Rapid Barcoding Kit (SQK-RAD004) (10).The S. Enteritidis isolates were recovered from six foodborne disease outbreaks in the laboratories of the Taiwan Centers for Disease Control (Taiwan CDC) between 2014 and 2022 and genotyped using the standardized PulseNet PFGE protocol (Table S2) (12).The collection of these bacterial isolates was executed through a series of disease surveillance projects, all of which obtained ethical approval from the Institutional Review Board of the Taiwan CDC, Ministry of Health and Welfare.These projects were registered under the IRB Numbers 110109 and 110111.

Genomic DNA extraction
DNA of bacterial isolates was extracted for WGS using the Qiagen DNeasy blood and tissue kit (Qiagen Co., Germany), following the protocol provided by the manufacturer.

Phylogenetic tree
Phylogenetic trees (tanglegrams) were constructed with cgMLST or wgSNP profiles using the single linkage clustering algorithm and the dendextend toolkit (23).The degree of correlation between phylogenetic trees was measured by Baker's Gamma index (BGI) (24).

WGS analysis of S. Enteritidis
WGS analysis was conducted on 23 S. Enteritidis isolates from six outbreaks using ONT R10.4 flow cells with the Rapid Barcoding and Ligation-duplex (Native Barcoding) kits.Basecalling was performed using Dorado 0.5.0SUP4.2 and SUP4.3 models, with or without modification-mediated error correction using Modpolish.

Phylogenetic analysis
Clustering analysis of cgMLST and wgSNP profiles was conducted to assess the similarity between phylogenetic trees constructed with Illumina and ONT sequences, measured by BGI values ranging from −1 to 1.In the case of L. monocytogenes, exceptionally high BGI values (0.9999 and 1) were observed for both cgMLST and wgSNP trees constructed using the ONT R10.4 sequences from SUP4.2 and SUP4.3 basecalling, as well as the ONT R10.4 sequences refined with Modpolish (see Table S3).For S. Enteritidis isolates, the cgMLST trees constructed with ONT R10.4_RAP and ONT R10.4_LIG sequences from SUP4.3 basecalling exhibited significantly higher BGI values compared to sequences from SUP4.2 basecalling (BGI, 0.9983 and 0.9975 vs 0.3575 and 0.5638; Table S3).wgSNP analysis obtained even higher similarity between trees, as indicated by elevated BGI values.Notably, the application of Modpolish enhanced the BGI values for cgMLST and wgSNP trees constructed with ONT sequences from SUP4.2 but not SUP4.3basecalling.
In intra-outbreak analyses, differences in cgMLST profiles generated from Illumina sequences ranged from 1 to 3 loci among isolates from an outbreak, while those from ONT R10.4_LIG SUP4.3 sequences ranged from 1 to 4 loci (Fig. 1).In wgSNP profiles, Illumina sequences displayed differences of 0-4 SNPs among isolates from an outbreak, while ONT sequences displayed differences of 0-8 SNPs.Similarly, ONT R10.4_RAP SUP4.3 sequences displayed a high degree of similarity, as evidenced by BGI values of 0.9983 for the cgMLST trees and 0.9984 for the wgSNP trees (see Table S3).

DISCUSSION
Our findings indicate that the accuracy of ONT R10.4 sequences from L. monocytogenes and S. Enteritidis isolates, when subjected to basecalling using the Dorado 0.5.0Super Accurate model 4.3, closely approximates the accuracy observed in Illumina sequences.Additionally, the accuracy of L. monocytogenes ONT sequences can be further improved through the correction of the Modpolish toolkit.Notably, the accuracy of S. Enteritidis ONT sequences, generated from both Rapid and Ligation methods, closely parallels that of Illumina sequences.This improvement suggests that ONT sequencing has the potential to be a promising tool for rapid WGS-based genotyping of bacterial strains, thereby greatly contributing to disease surveillance and outbreak investigation practices.In our earlier investigation, we showed that the sequences generated with the ONT R9.4 device lacked the necessary accuracy for WGS-based genotyping of L. monocytogenes (10).The primary cause of the inaccuracy was errors resulting from base modification (10).While the Modpolish toolkit effectively proved most of these errors, the polished sequences still failed to meet the required accuracy for WGS-based genotyping (10).In the present study, we demonstrate a notable improvement with the implementation of the ONT R10.4 device along with the Dorado basecaller, effectively eliminating errors attributed to base modifications observed in ONT R9.4 sequences (Table 1).In addition, we demonstrate further improvement in the accuracy of ONT sequences of L. monocyto genes through the application of the Modpolish toolkit.
Our data indicate that ONT R10.4 sequences of S. Enteritidis, generated through both Rapid and Ligation methods and basecalled using the Dorado SUP4.3 model, exhibit an accuracy comparable to the Illumina sequences.In our initial analysis, we basecalled the ONT R10.4-RAP and ONT R10.4-LIG sequences from S. Enteritidis using the Dorado SUP4.2 model, resulting in sequences that did not match the accuracy of Illumina sequences (Table 2).Subsequent re-analysis, following the release of SUP4.3, demonstrated a substantial improvement in the accuracy of the ONT R10.4 sequences.Notably, the ONT sequences differ from the Illumina sequences by an average of seven loci in the cgMLST profiles and 1-2 SNPs in the wgSNP profiles (Table 2).The phyloge netic trees constructed for the 23 S. Enteritidis isolates using ONT R10.4-RAP and ONT R10.4-LIG sequences closely align with those built using the Illumina sequences (Fig. 1; Table S3).These findings suggest that ONT sequencing alone may serve as a reliable tool for WGS-based genotyping of bacterial strains in public health laboratories for disease surveillance and outbreak tracing.
We demonstrate the superiority of the Dorado SUP4.3 model over SUP4.2 in converting ONT R10.4 sequences.The refined ONT sequences from both L. monocytogenes and S. Enteritidis exhibit comparable accuracy to Illumina sequences, as evident in the phylogenetic analysis and outbreak identification (Fig. 1).While our study was conducted only on two bacterial species, a crucial consideration arises regarding the applicability of the SUP4.3.In an investigation, the functionality of the SUP4.2 and SUP4.3 models was assessed using 12 standard genomes representing bacterial species, including Campylobacter jejuni, Campylobacter lari, Escherichia coli, Listeria ivanovii, L. monocytogenes, Listeria welshimeri, S. enterica, Vibrio cholerae, and Vibrio parahaemolyticus (https://rrwick.github.io/2023/12/18/ont-only-accuracy-update.html).This assessment demonstrates that the SUP4.3 model significantly improved both read accuracy and assembly accuracy compared to the SUP4.2, thereby extending its potential applicability across diverse bacteria species.
In conclusion, ONT sequences generated from R10.4 flow cells and basecalled using the Dorado 0.5.0SUP4.3 model exhibited accuracy comparable to Illumina sequences in WGS-based genotyping of L. monocytogenes and S. Enteritidis isolates.However, to comprehensively assess the effectiveness of this method in disease surveillance and outbreak investigation, it is imperative to conduct further investigations involving more isolates from diverse bacterial species.

FIG 1
FIG 1 Dendroscope tanglegram comparison between cgMLST trees (A) and wgSNP trees (B), constructed with Illumina contigs and ONT contigs for S. enterica serovar Enteritidis isolates from six outbreaks.

TABLE 1
Comparison of cgMLST and wgSNP profiles of L. monocytogenes isolates generated from Illiumina and ONT sequences a Sequences generated using Illumina MiSeq platform or ONT R9.4 and R10.4 flow cells with the ligation method.ONT sequences were basecalled using Dorado Super Accurate models 3.3, 4.2, and 4.3 with or without correcting modification-mediated errors using Modpolish.LIG, Ligation method; SUP, Dorado Super Accurate model; MOD, Modpolish. a