Genomic Surveillance Enables Suitability Assessment of Salmonella Gene Targets Used for Culture-Independent Diagnostic Testing

Salmonella is a highly diverse genus consisting of over 2,600 serovars responsible for high-burden food- and waterborne gastroenteritis worldwide. Sensitivity and specificity of PCR-based culture-independent diagnostic testing (CIDT) systems for Salmonella, which depend on a highly conserved gene target, can be affected by single nucleotide polymorphisms (SNPs), indels, and genomic rearrangements within primer and probe sequences. This report demonstrates the value of prospectively collected genomic data for verifying CIDT targets.

reported as causal agents of NTS, including some that are unique to Australia (2,3). The prevalence of Salmonella serovars differs significantly between continents. For example, S. enterica subspecies enterica serovar Enteritidis is the most commonly detected serovar in Europe and North America, whereas S. Typhimurium predominates in Australia and New Zealand (4)(5)(6)(7).
The public health, food safety, and trade implications of foodborne Salmonella are such that a highly sensitive and specific surveillance system is required to ensure rapid detection and characterization of foodborne outbreaks. The introduction and often complete replacement of traditional culture for Salmonella with culture-independent diagnostic testing (CIDT) as the front-end method of detecting Salmonella in stool samples has presented several challenges to public health laboratory surveillance (5,(8)(9)(10). The reliance of all current CIDT platforms on a single gene target for the detection of a diverse pathogen such as Salmonella has raised important questions regarding ability of these CIDTs to detect uncommon serovars or emerging variants. As CIDTs become more widely used for Salmonella diagnosis, they become, in some cases, the only signal of a potential outbreak and the need for reflex culture. Therefore, any loss in CIDT sensitivity can have serious consequences for the diagnosis of individual cases as well as recognition and management of outbreaks.
Although PCR-based techniques are considered extremely sensitive, the accumulation of mutations within PCR targets has been documented for other pathogens. Mutations or indels can occur within CIDT primer and probe target regions, and depending on the location of these mutations, CIDT sensitivity can be diminished, in some cases resulting in false-negative results. For example, the sole reliance on CIDT to diagnose the highly conserved bacterium Chlamydia trachomatis is an example of the shortcomings of CIDT-only testing algorithms (11). In 2006, it was reported that a C. trachomatis variant was circulating and that this variant harbored a deletion in the PCR primer target region contained within the cryptic plasmid (11). This highly conserved multicopy plasmid had been thought to be the ideal diagnostic target, as it offered increased sensitivity for detecting C. trachomatis. However, emergence of a variant harboring a deletion within the cryptic plasmid caused a complete loss of sensitivity in the majority of commercial CIDTs, resulting in the ongoing transmission of C. trachomatis and an increased number of severe sequalae after unrecognized and prolonged C. trachomatis infections (12). These false-negative results for patients with C. trachomatis variant infections were noted only after a significant decrease in the incidence of C. trachomatis triggered an investigation by public health authorities. The response to this CIDT system failure was to design an additional C. trachomatis PCR target. The vulnerability of CIDT to generation of false-negative results due to nucleotide dissimilarity has led to the recommendation that a two-target system is needed for all frontline infectious disease CIDT assays (13). Despite this, the current Salmonella CIDTs are often reliant on a single PCR target region.
Whole-genome sequencing (WGS) provides the ultimate resolution to correctly identify outbreak clusters and detect food or environmental pathogen reservoirs (14,15). This ability triggered the rapid uptake of WGS as the preferred approach for public health surveillance of salmonellosis, resulting in accumulation of genome sequence data. These genomic data provide an opportunity to examine the variability of molecular targets employed by different CIDT platforms in the context of locally circulating Salmonella serovars. In this study, we examined the variability in CIDT targets and the robustness of current CIDT systems utilizing a comprehensive Salmonella genome collection spanning more than two summer seasons in New South Wales (NSW), Australia.

MATERIALS AND METHODS
Isolate collections. Salmonella isolates included in this study represent all isolates referred to the NSW Enteric Reference Laboratory, Institute of Clinical Pathology and Medical Research (ICPMR), NSW Pathology, that underwent whole-genome sequencing as part of routine public health outbreak investigations in NSW, Australia between October 2015 and December 2018 (n ϭ 3,256). In addition, WGS was performed on 43 isolates collected from historical outbreaks of salmonellosis where the causal Salmonella serovar had previously been described as native to Australia (3).
Nucleic acid extraction and library preparation. A single colony was used for DNA extraction; extraction was performed using the Geneaid Presto genomic DNA bacterial kit (Geneaid, Taiwan) per the manufacturer's instructions for Gram-negative bacteria. DNA extracts were treated with 1 U of RNase. DNA libraries were prepared with the Nextera XT library preparation kit, using 1 ng/l of DNA in accordance with manufacturer's instructions. Multiplexed libraries were sequenced using paired-end 150-bp chemistry on the NextSeq 500 system (Illumina, Australia).
CIDT gene targets were extracted from contigs using in-house Perl scripts and BLASTϩ (24). CIDT target genes from the closed reference genome Salmonella enterica subsp. enterica serovar Typhimurium strain LT2 (GenBank accession no. AE006468.2) were used as a reference. All CIDT nucleotide sequences extracted from isolate contigs were aligned to the reference using MAFFT (25). Single nucleotide polymorphisms (SNPs) were called by their comparison to the reference genome using SNP-sites (26). SNP differences and the length of PCR target genes were used to calculate the dissimilarity index (number of SNPs per kilobase pair) for each gene (gene lengths: invA, 2,057 bp; spaO, 919 bp; and ttrA, 3,062 bp). Entropy at each position within PCR target genes was measured from the PCR gene target alignments using the formula Primer and probe design and assessment. The specific nucleotide sequences of the primer and probes used by the commercial CIDTs investigated in this study are proprietary; however, the genes they target have been reported. To investigate diversity with the primer and probe sequences, we performed an in silico prediction of primer and probe targets for each gene. High-homology regions of the PCR target genes (ttrA, spaO, and invA) were extracted from reference genomes of 10 common medically relevant serovars associated with salmonellosis (S. Typhimurium, S. Enteritidis, S. Newport, S. Saintpaul, S. Virchow, S. Infantis, S. Heidelberg, S. Montevideo, S. Javiana, S. Muenchen, and S. Braenderup). The reference Salmonella enterica genomes for these serovars were downloaded, and target gene sequences were extracted as outlined in the bioinformatic analyses described above (the NCBI accession numbers for reference genomes are AE006468.2, CP045063, NC_011080.1, NC_011083.1, NC_011294.1, NC_020307.1, NZ_CP007530.1, NZ_CP017727.1, NZ_CP022490.1, NZ_CP025094.1, and NZ_LN649235.1). Regions of homology between all sequences Ͼ300 bp in length were employed to find the best primer and probe targets using the PrimerQuest tool (Integrated DNA Technologies). The predicted oligonucleotides were then extracted from the genomes used in this study to investigate polymorphisms within the primer and probe sequences. The impact of any polymorphisms detected was based on both the number of SNPs in each oligonucleotide and the position of the SNPs within the oligonucleotide. SNPs within the last 5 bp of the 3= end of the oligonucleotide were considered to have a moderate effect on PCR sensitivity, as were more than two SNPs within the same oligonucleotide. Significant sensitivity losses are predicted to occur when more than three SNPs are detected within the same oligonucleotide or more than one SNP within the last 5 bp at the 3= end of each oligonucleotide. Limited sensitivity losses were predicted for single SNPs outside the last 5 bp at the 3= end of the primer or probe (27).
Statistical analysis. The diversity of NTS serovars in Australia between 2009 and 2017 was investigated using the National Notifiable Diseases Surveillance System Public Salmonella data set (http://www9 .health.gov.au/cda/source/pub_salmo.cfm). The total number of NTS serovars reported and case notifications of salmonellosis each year were used to calculate a Simpson's index of diversity.
Data availability. The assembled genomes for all 3,165 isolates and individual gene target sequences ttrA, spaO, and invA were deposited in the NCBI (BioProject no. PRJNA596817). Associated metadata, including genome assembly statistics, multilocus sequence typing (MLST) results, and serovar prediction, can be found in Data Set S1 in the supplemental material.

RESULTS
Phylogenomic comparison of Salmonella genomes. A total of 3,165 Salmonella isolates were included in this study. Isolate genomes that failed to meet sequence quality metrics or were sequenced multiple times were removed (n ϭ 134). The core genome shared by study isolates consisted of 3,270 genes with a total length of 2,963,964 bp. Maximum-likelihood core genome phylogeny was constructed using 98,254 polymorphic sites (median SNP distance, 13,588; range, 0 to 98,254) within core genes and demonstrated branching largely corresponding to serovar differentiation (Fig. 1). The core genome included the PCR gene targets investigated in this study: ttrA, spaO, and invA. Assembly statistics for Salmonella genomes included in the study are outlined in Table S1. The genomes investigated represented 52 different serovars and 79 MLST types. The most common serovar was S. Typhimurium (n ϭ 2,175; 69%), including the monophasic S. Typhimurium variant I with the antigenic structure 4, [5],12: i:Ϫ (n ϭ 212; 7%), followed by S. Enteritidis (n ϭ 406; 13%) and S. Saintpaul (n ϭ 97; 3%). A detailed list of the serovars and MLST types of all isolates is presented in Table S2.
Entropy of CIDT target genes. Entropy of CIDT target regions indicated that nucleotide dissimilarity was dispersed across the length of the sequence (Fig. 3). Clear regions of conservation were seen in the 3= end of invA and 5= end of spaO gene sequences. However, polymorphisms were present throughout the ttrA nucleotide sequence. In addition, 16 genomes had truncations of the 5= and 3= ends of the ttrA nucleotide sequence (median length, 1,337 bp [range, 60 to 3,035 bp]). However, these genomes had lower-quality assembly metrics than the entire genome collection. The medium number of contigs in the eight assemblies that contained a truncated ttrA gene sequence was 99.5 (34 to 135), in contrast to the median number for all genomes analyzed, 78 (24 to 169). N 50 values for genomes containing a truncated ttrA nucleotide sequence, i.e., 139,974 (66,538 to 261,187), were slightly lower than those for all genomes analyzed, i.e., 172,206 (52,145 to 518,076). Therefore, these truncations may have resulted from de novo assembly errors.
Nucleotide diversity in primer and probe targets. Optimal primer and probe sequences were predicted in silico using regions of PCR target genes homologous among 10 common Salmonella enterica serovars. Predicted gene sequences and oligonucleotide parameters are specified in Table 1. Entropy across the reverse transcription-PCR (RT-PCR) amplicon for each gene target is depicted in Fig. 3. When the entropy of the CIDT target was compared across all genomes in this study, the RT-PCR target regions remained in areas of low nucleotide variability. The specific oligonucleotide sequences were extracted from each genome, with only 2% (60/3,165) of genomes containing SNPs within proposed oligonucleotide sequences. Four, 12, and 44 genomes contained oligonucleotide mutations in the invA, ttrA, and spaO RT-PCR oligonucleotides, respectively. Most single SNPs in either the forward or reverse primers and probe oligonucleotides were expected to have a small impact on assay sensitivity (48/60). However, the remaining 12 genomes had multiple SNPs within a single oligonucleotide or contained a single mutation in multiple oligonucleotides. Multiple mutations in single oligonucleotides or mutations across all oligonucleotides of an RT-PCR are likely to lead to significant sensitivity losses and in some cases false-negative results. The genomes harboring multiple mutations (12/60) were all contained within cgGroup 4, which contained the most divergent core genomes (Table 2). Temporal trends in Australian nontyphoidal Salmonella serovar diversity. Based on the Australian National Notifiable Diseases Surveillance System public Salmonella data set, there were more than 200 serovars identified in association with 14,000 salmonellosis cases during each year during the period from 2009 to 2017 in Australia. A gradual increase in serotype diversity and reported incidence of nontyphoidal salmonellosis was detected between 2009 and 2017 (Fig. 4). CIDTs for Salmonella diagnosis were introduced widely across Australia in late 2013, and notable increases in the Simpson's diversity index were observed after 2015. Subspecies II to IV account for around 1% of salmonellosis cases reported each year in Australia.

DISCUSSION
WGS is being increasingly utilized to investigate outbreaks of Salmonella and to accurately identify food and environmental sources of infection. This high-resolution approach has greatly enhanced the speed and accuracy of public health interventions ensuring food safety. Genome sequencing data collected as part of prospective public health surveillance also enable the ongoing sensitivity of foodborne outbreak surveillance to be determined and monitored, as described in the present study. Public health authorities have raised concerns that the increasing reliance on CIDTs as the only tool to detect pathogens responsible for foodborne disease will significantly reduce the sensitivity of public health surveillance systems, as serotyping and WGS require Salmonella to be isolated by solid medium culture (8,9,28). The replacement of conventional culture with CIDT reduces the number of isolates available for WGS and therefore the ability of public health laboratories to identify and control outbreaks of salmonellosis. In Australia, it has been reported that 12 to 18% of Salmonella-positive stool samples are identified based solely on CIDT testing (8,29). In North America, Salmonella-positive stool samples are often reflexively cultured after a positive Salmonella CIDT at public health laboratories in an attempt to boost the number of Salmonella isolates available for WGS; Australia has not yet widely adopted reflex culture after a positive Salmonella CIDT; however, it may be required to maintain the current WGS-based surveillance systems (5,10).
CIDT platforms offer highly automated "swab-to-result" systems with high throughput, rapid turnaround time, and reduced testing costs. However, most commercial assays offer a single PCR target for Salmonella detection, with the specific primer and probe target sequences being difficult to uncover, as they are commonly proprietary. In this study, we systematically interrogated three genes used as PCR targets by different CIDTs, i.e., ttrA (Roche LightMix), spaO (BD Max), and invA (a common in-house Salmonella PCR target). While our genomic analysis based on WGS data allowed the assessment of dissimilarity of CIDT targets, the proprietary nature of PCR primers and probes made it difficult to accurately estimate the extent of the potential sensitivity losses. We were, however, reassured that significant changes in the spectrum and diversity of Salmonella serovars do not appear to have occurred despite the recent introduction of CIDT systems for detection of bacterial enteropathogens in stool samples from patients with gastroenteritis in Australia. The Australian market has been dominated by commercial systems, namely, the BD Max enteric bacterial panel PCR assay and the Roche LightMix fecal PCR screening kit. The recent implementation of other panels, including the BioFire FilmArray GI panels, for which the gene targets are not disclosed, as well as noncommercial PCR CIDT, has only recently occurred within the Australian diagnostic testing landscape, and hence, their impact on Salmonella surveillance is yet to be determined. If proprietary CIDT primer and probe nucleotide sequences are not made freely available, increased diligence from companies and pathology providers may be required, with expanded evaluation of CIDT targets and ongoing monitoring of diversity within CIDT targets. Without this reassurance from test providers, monitoring trends in the diversity of Salmonella serovars is important to monitor the impact of CIDT, particularly as an array of different enteric CIDT panels are added to the diagnostic capacity in Australia.
Dissimilarity of PCR targets occurred within each of the 52 Salmonella serovars investigated; the highest levels of dissimilarity were noted among the three genomes falling outside S. enterica subsp. enterica (subspecies I). Although Salmonella subspecies I isolates predominate as the causal agents of foodborne outbreaks in Australia and elsewhere, it should be noted that current CIDTs may have diminished sensitivity to detect the less common S. enterica subspecies with potential for emergence as human pathogens, especially isolates from the predominantly zoonotic subspecies IV and VI (24). Among the individual genes, entropy across PCR targets suggests that the spaO gene may contain a more conserved PCR target region, while other targets may be affected by dissimilarity across the entire gene. In particular, truncations of the ttrA gene region were present in a small number of genomes. The ttrA gene encodes tetrathionate reductase subunit A and is part of the ttrRSBCA operon, which is required for tetrathionate respiration and located in close proximity to Salmonella pathogenicity island 2. The integration of virulence factors or other evolutionary pressures may increase the variability in this particular CIDT target, but these truncations will need to be validated in other data sets to exclude bioinformatic assembly errors.
Truncations aside, the effect of polymorphisms in oligonucleotides on assay sensitivity is difficult to assess and can be affected by a number of parameters. The largest sensitivity losses are seen with mutations at the 3= end of primer sequences (30). However, the composition of the mismatched base also plays a role, as purinepyrimidine mismatches are generally less detrimental than purine-purine/pyrimidinepyrimidine polymorphisms (27,31,32). More general considerations are also important, including the master mix composition and the number of multiplexed primer and probe combinations. Generally, the more primer pairs are included in a multiplex reaction, the greater the loss of sensitivity for the mismatched primer set, particularly in cases involving polymicrobial intestinal infections. These observations are especially relevant and important, as prediction models indicate that even a small loss of PCR sensitivity could be detrimental to outbreak detection.
In conclusion, the growing volumes of genomic surveillance data allow ongoing reassessment and validation of CIDT targets representing prevalent and emerging serotypes of public health significance. Nucleotide dissimilarity of CIDT targets in different serovars of Salmonella subspecies IV and VI may affect the public health surveillance of nontyphoidal salmonellosis in areas where they are endemic. If CIDT systems are to become the primary screening and diagnostic tool for laboratory diagnosis of salmonellosis, ongoing monitoring of the local genomic diversity in PCR target regions is warranted. A two-gene target detection system is also recommended and would limit the potential for false-negative CIDT results for Salmonella.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, XLSX file, 0.5 MB. SUPPLEMENTAL FILE 2, PDF file, 0.1 MB.