Genomic analysis of Salmonella enterica serotype Paratyphi A during an outbreak in Cambodia, 2013–2015

In 2013, an unusual increase in the number of Salmonella enterica serotype Paratyphi A (Salmonella Paratyphi A) infections was reported in patients in Phnom Penh, Cambodia, and in European, American and Japanese travellers returning from Cambodia. Epidemiological investigations did not identify a common source of exposure. To analyse the population structure and genetic diversity of these Salmonella Paratyphi A isolates, we used whole-genome sequencing on 65 isolates collected from 1999 to 2014: 55 from infections acquired in Cambodia and 10 from infections acquired in other countries in Asia, Africa and Europe. Short-read sequences from 80 published genomes from around the world and from 13 published genomes associated with an outbreak in China were also included. Pulsed-field gel electrophoresis (PFGE) was performed on a subset of isolates. Genomic analyses were found to provide much more accurate information for tracking the individual strains than PFGE. All but 2 of the 36 isolates acquired in Cambodia during 2013–2014 belonged to the same clade, C5, of lineage C. This clade has been isolated in Cambodia since at least 1999. The Chinese outbreak isolates belonged to a different clade (C4) and were resistant to nalidixic acid, whereas the Cambodian outbreak isolates displayed pan-susceptibility to antibiotics. Since 2014, the total number of cases has decreased, but there has been an increase in the frequency with which nalidixic acid-resistant C5 isolates are isolated. The frequency of these isolates should be monitored over time, because they display decreased susceptibility to ciprofloxacin, the first-choice antibiotic for treating paratyphoid fever.


Introduction
Salmonella enterica serotypes Typhi (Salmonella Typhi) and Paratyphi A (Salmonella Paratyphi A) are Gram-negative bacteria confined to human hosts. They can invade the bloodstream, causing typhoid and paratyphoid fever (known together as 'enteric fever'). These infections have become rare in Western countries, but they continue to be responsible for a considerable burden of disease in lowresource settings (Bhan et al., 2005). Historically, enteric fever was mostly caused by Salmonella Typhi. However, the number of Salmonella Paratyphi A infections has been increasing steadily over the last 20 years, and it is now estimated that there are six million cases of paratyphoid fever annually (Newton et al., 2016). This increase is most apparent in Asia, where Salmonella Paratyphi A now accounts for 14 % (Indonesia) to 64 % (South-East China) of enteric fever cases (Ochiai et al., 2005). Furthermore, international travellers returning from countries in which Salmonella Paratyphi A is endemic are increasingly being infected with this bacterium, which accounted for 31 % of imported cases of enteric fever in the European Union in 2011 (ECDC, 2013).
During 2011-2013, an unusual increase in the number of Salmonella Paratyphi A infections was reported among patients attending the Sihanouk Hospital Centre of HOPE (SHCH), Phnom Penh, Cambodia (Vlieghe et al., 2013). This increase coincided with an obse rved increase in the number of cases of Salmonella Paratyphi A infection among travellers returning to Europe, the USA, New Zealand or Japan from Cambodia (Tourdjman et al., 2013;Saitoh et al., 2016;Judd et al., 2015). No similar increase in Salmonella Typhi infections was reported. Epidemiological investigations identified no common source of exposure and the route of disease transmission remained unknown. The analysis of population structure and genetic diversity among outbreak isolates is an important component of epidemiological investigations. Such information can provide clues to the origin and transmission of the bacterial pathogens. Molecular epidemiology analysis was carried out on only 22 Salmonella Paratyphi A isolates from Japanese travellers infected during 2012-2014 (Saitoh et al., 2016). PFGE with a single enzyme (XbaI) identified a predominant profile (18/22, 81.8 %) among the outbreak isolates, suggesting that the outbreak was caused by a single strain. However, PFGE, the previous gold standard for subtyping Salmonella spp., has proved unsuitable for the analysis of highly clonal serotypes like Salmonella Paratyphi A (Tien et al., 2011). In this study, our main aim was to use accurate genetic information, from wholegenome sequencing data in particular, to analyse the population structure and genetic diversity of the Salmonella Paratyphi A isolates collected during the 2013-2015 outbreak in Cambodia.

Impact Statement
The bacterium Salmonella Paratyphi A can cause bloodstream infections (paratyphoid fever) and is transmitted via food or water contaminated with waste products from infected individuals. Accurate genetic information about bacterial isolates is crucial to support public health investigations, for tracing transmission and identifying the source, in particular. Classical typing methods, such as pulsed-field gel electrophoresis (PFGE), do not perform well on this genetically monomorphic bacterial population. We used whole-genome sequencing and comparative genomics for the retrospective investigation of an outbreak of Salmonella Paratyphi A infections in Cambodia. The greater discriminatory power and portability of genomic analyses, compared with PFGE, made it possible to identify the isolates associated with the outbreak unambiguously, and to assess their genetic relationships with isolates from sporadic cases, historical isolates and isolates from another outbreak in China.

Methods
Surveillance systems -Cambodia. The Cambodian isolates were collected at the SHCH, a 40-bed non-governmental referral hospital for adults in Phnom Penh.  Table S1 (available in the Supplementary Material). They originated from the collections of the FNRC-ESS (n=31), the SHCH (n=30), and the Enteric and Leptospira Reference Laboratory at the Institute of Environmental Science and Research Limited, Wallaceville, New Zealand (n=4). The isolates were obtained from blood (n=59), stools (n=4), urine (n=1) or other sources (n=1). Conventional methods and serotyping at the FNRC-ESS, as previously described (Grimont & Weill, 2007), confirmed that all 65 isolates were Salmonella Paratyphi A.
Identification of gyrA mutations associated with decreased susceptibility to ciprofloxacin. The quinolone-resistance-determining region (QRDR) of the gyrA gene encoding the DNA gyrase A subunit was sequenced in five nalidixic acid-resistant Salmonella Paratyphi A isolates obtained from travellers returning from Cambodia during 2014-2015, as previously described (Weill et al., 2006). The nucleotide and deduced amino acid sequences were analysed and compared with sequences available from the National Center for Biotechnology Information website (NCBIhttp://www.ncbi.nlm.nih.gov/last accessed 16 May 2016).
PFGE. PulseNet standard PFGE of XbaI-digested chromosomal DNA was carried out for a selection of 51 Salmonella Paratyphi A isolates (Ribot et al., 2006). BlnI-PFGE was also used to type a subsample of 24 isolates.
Whole-genome sequencing. High-throughput genome sequencing was carried out at the genomics platform of the Pasteur Institute, on the HiSeq2500 platform (Illumina) generating 138 to 151 bp paired-end reads, yielding a mean of 169-fold coverage (minimum 74-fold, maximum 407fold).
Read alignment and SNP detection. For each isolate, we aligned the paired-end reads with the Salmonella Paratyphi A ATCC 9150 reference genome (McClelland et al., 2004) (GenBank accession no. NC_006511), using Bowtie with default parameters (Langmead & Salzberg, 2012). SAMtools (Li et al., 2009) was then used to build a genome index and to identify SNPs from the Bowtie alignments. Several criteria were used to filter the resulting SNPs: a minimum coverage (number of reads mapped to the reference genome) of 20 and a minimum quality score of 25 for each SNP. The SNPs retained were concatenated to generate a multiple alignment of all SNPs by an in-house Perl script. The resulting sequences were further filtered to remove all SNPs present in insertion sequences identified by ISfinder (Siguier et al., 2006). Other repetitive regions were identified by a self-self BLAST analysis (Altschul et al., 1990) of the reference sequence, using the following parameters: megablast (word size 28), identity percentage >95 % and match length >400 bp. This masked 3.3 % of the reference genome. Finally, clusters of SNPs introduced via horizontal sequence transfer were detected and removed with Gubbins (Croucher et al., 2015).
Phylogenetic and temporal analyses. SNP alignments were analysed with Bayesian evolutionary analysis sampling trees (BEAST) (Drummond & Rambaut, 2007) version 1.8.2, which was used to reconstruct temporal phylogenetic trees and to estimate divergence times. The input consisted of a continuous alignment of 4810 non-repetitive, nonrecombinant SNPs from 159 strains, together with the numbers of invariant A, C, T and G nucleotides. The concatenated SNP alignments were subjected to multiple BEAST analyses with both constant-size and Bayesian skyline population size-change models, in combination with strict, log-normal or exponential relaxed clock models, to identify the best-fit model. For the BEAST analysis, the GTR+G substitution model was selected, and tip dates were defined as the year of isolation. For all model combinations, three independent chains of 100 million generations each were run to ensure convergence, with sampling every 1000 iterations. Convergence and effective sample size values were inspected with Tracer (Drummond & Rambaut, 2007) version 1.5. A marginal likelihood estimation was carried out, with path sampling and stepping stone sampling for each run that had converged, to compare the different combinations of clock and tree models (Baele et al., 2012(Baele et al., , 2013. We then used marginal likelihood estimation to determine which model gave the best fit, by calculating the Bayes factor. The relaxed (uncorrelated exponential) clock model, which allows evolutionary rates to vary between the branches of the tree, and the skyline demographic model provided the best fits to the data (Table S2) The parameter and tree estimates of the three runs were combined with LogCombiner (Drummond & Rambaut, 2007) version 1.7.5, with the first 20 % of states in each chain removed as burn-in. Maximum clade credibility trees were generated with TreeAnnotator (Drummond & Rambaut, 2007)  Genetic analyses. The presence and type of antibioticresistance genes (ARGs) were determined with ResFinder version 2.1 (Zankari et al., 2012). The presence of mutations in the QRDR of the DNA gyrase and topoisomerase IV genes was assessed by the visual examination of de novo assembled sequences.

Epidemic curves
During the 2008-2012 period, enteric fever in patients attending the SHCH and in travellers returning from Cambodia to France was caused mostly by Salmonella Typhi (Fig. 1). During this period, 12 cases of Salmonella Paratyphi A infection were identified among patients attending the SHCH, and only 1 such case was reported in a traveller returning from Cambodia to France. In 2013, a sharp increase in the number of cases was observed both at the SHCH and among travellers returning from Cambodia to France, with a total of 72 and 29 cases, respectively. In 2014 and 2015, the annual number of cases decreased, but remained higher than for the period preceding 2013. During this outbreak, the number of Salmonella Typhi cases remained relatively stable.
In total, during the 2013-2015 period, 178 Salmonella Paratyphi A cases were reported among SHCH patients, and 49 cases were reported among travellers returning from Cambodia to France. The median age of the affected patients at the SHCH during this period was 26 years (interquartile range 22-31). Male and female patients were equally affected (48.9 % of the cases were male); none of these patients died. The median age of the affected travellers returning to France was 33 years (interquartile range 27-49), and 43 % were male; none of these patients died.

Phylogenetic and temporal analyses
We performed short-read genomic sequencing (Illumina) on 65 Salmonella Paratyphi A isolates (Table S1), including 55 from patients in or returning from Cambodia, obtained between 1999 and 2014. Thirty of these fifty-five Cambodian isolates were collected in Phnom Penh, between 2008 and 2013, the other 25 being collected from travellers returning to France or New Zealand from Cambodia (including seven who had also travelled to neighbouring countries, such as Vietnam, Laos, Thailand and Malaysia) or migrants from Cambodia. The other ten isolates sequenced were obtained between 2013 and 2014, from French travellers who had not visited Cambodia but had travelled to other countries in South-East Asia, Southern Asia or Africa, and from a case where the patient had not travelled abroad (laboratory contamination). Short-read sequences from 80 published Salmonella Paratyphi A genomes (Zhou et al., 2014) were selected and included in our study, to provide the population structure framework of this serotype, with its seven lineages, A to G. Short-read sequences from 13 published Salmonella Paratyphi A genomes associated with a large-scale community outbreak in Southern China during the 2010-2011 period were also included (Yan et al., 2015). The assembled genome of ATCC 9150a laboratory strain of unknown origin (McClelland et al., 2004) was used as the reference genome. The genomic analysis was carried out on a final set of 159 Salmonella Paratyphi A genomes, including 63 linked to Cambodia during the 1958-2014 period. ML (Fig. S1) and BEAST (Fig. 2) phylogenetic analyses were performed on 4810 chromosomal SNPs from the nonrepetitive non-recombinant core genome of 159 Salmonella Paratyphi A genomes (Table S3). Both approaches confirmed the presence of the seven lineages, A to G, described by Zhou et al.  ML and BEAST analyses revealed that all but 3 of the 47 isolates collected in Cambodia or obtained from travellers returning from Cambodia between 2010 anin clade C5 (Table 1 and S1, Figs 2 and S1). During the 2013-2014 outbreak period, all but 2 of the 36 isolates acquired in Cambodia belonged to clade C5. The two remaining isolates belonged to genetic groups A and C2. Certain SNPs were identified exclusively within clade C5. Thirty-two of these specific SNPs were found in more than 70 % of the sixtyfour C5 isolates, including three found in all the C5 isolates (Table S4). The mean intra-clade pairwise SNP variation within C5 isolates was 23 (minimum 1-maximum 82), whereas it was 7 (1-29) if only the C5 isolates collected between 2010 and 2014 were taken into account. No particular geographical or temporal clustering was observed for the patients hospitalised at the SHCH and infected with such C5 isolates. This outbreak strain was derived from a strain that had been circulating locally and regionally, as C5 isolates had already been isolated in Cambodia from 1999, and in Vietnam from 1963 to 2005 (Table S1). The C5 strain responsible for the outbreak was also isolated from travellers returning from Nepal (one case), Thailand (one case) and Vietnam (one case) in 2013 (Table S1, Fig. S1). However, the Cambodian outbreak strain was different from that responsible for the Chinese outbreak in 2010-2011 (Yan et al., 2015), which belonged to clade C4 (Table 2, Table S1, and Fig. 2). The C4 Chinese isolates differed from the C5 isolates by 77 SNPs on average (48-91), and the mean intra-clade pairwise SNP variation within C4 isolates was 11 (3-33).
We also assessed the genetic diversity of the Salmonella Paratyphi A isolates by PulseNet standardized PFGE. A subsample of 51 Salmonella Paratyphi A isolates was first analysed by XbaI-PFGE (Table 3 and Fig. 3). A predominant profile, XParA_001 was observed in 35/41 (85.4 %) C5 isolates analysed. However, three other XbaI-PFGE profiles were seen in the remaining C5 isolates. The XParA_001 profile was also found in 75 % (3/4) of the C2 and C3 A maximum clade credibility tree generated by BEAST 1.8 (exponential clock rate and Bayesian skyline population models) is shown, providing information about the geographical origin, antimicrobial resistance phenotype (AMR type), gyrA mutation encoding resistance to nalidixic acid and lineages of the genomes. The different clades (C1 to C5) of lineage C are also indicated. The tips of the tree are coloured according to the geographical origin of the genomes. The abbreviations used for the antibiotics are as follows: AMP, ampicillin; NAL, nalidixic acid; STR, streptomycin; SXT, trimethoprim/sulfamethoxazole. AMP R indicates resistance to AMP. NAL R indicates resistance to NAL, which was associated with decreased susceptibility to ciprofloxacin (MIC in the range 0.125-1 mg l À1 ).
isolates analysed, but not in C4 isolates, which were phylogenetically closer to C5 (Fig. 2). Similar findings were obtained for analyses with a second enzyme, BlnI, on 24 of the 51 isolates (Fig. S2). A predominant profile, BParA_001, was observed in 14/15 C5 isolates analysed. However, another profile was seen in the remaining Cambodian outbreak C5 isolate. The BParA_001 profile was also found in all other C2 and C3 isolates tested, but not in C4 isolates.

Antibiotic-susceptibility testing and determination of resistance mechanisms
The Cambodian outbreak strain was pan-susceptible to antibiotics, with the exception of one isolate that was resistant only to streptomycin. This outbreak strain was susceptible to quinolones (i.e. nalidixic acid) and fluoroquinolones (i.e. ciprofloxacin), in particular. By contrast, the two non-C5 isolates collected during the outbreak in Cambodia were resistant to nalidixic acid (MIC>256 mg l À1 ) with decreased susceptibility to ciprofloxacin (MIC 0.09-0.38 mg l À1 ). This resistance was mediated principally by mutations in the QRDR of gyrA, the chromosomal DNA gyrase A gene. We therefore checked for such mutations within this QRDR. We identified mutations leading to a serine to tyrosine substitution at codon 83 (Ser83Tyr) in the lineage A isolate from 2014, and to an aspartic acid to asparagine substitution at codon 87 (Asp87Arg) in the clade C2 isolate from 2013. Quinolone-resistant Salmonella Paratyphi A bacterial populations were also present in Cambodia before the outbreak period, as we identified three nalidixic acid-resistant strains isolated there during the 2000-2006 period. They belonged to lineage A, and clades C2 and C5 of lineage C, and they carried the gyrA mutations Ser83Phe, Asp87Asn and Ser83Phe, respectively (Table 1). The C5 nalidixic-acid resistant isolate was collected in 2000, 11 years before the  outbreak. The Chinese outbreak was also caused by a strain resistant to nalidixic acid with decreased susceptibility to ciprofloxacin (Yan et al., 2015). Within the genomic sequences of the Chinese isolates, we identified the very common gyrA mutation Ser83Phe (Table 2, Table S1).
Overall, various gyrA mutations were acquired at least 10 times in the 159 global Salmonella Paratyphi A genomes studied, but only in lineages A and C (Fig. 2). The gyrA mutation leading to Ser83Phe substitution was the most frequently observed.

Pan-genome analysis
The pan-genome analysis (Fig. S3) carried out by Roary identified a total of 7009 genes, including 3342 for the core genome, in the 159 genomes studied. We detected no gains of particular accessory genes nor losses of common genes in the Cambodian outbreak strain (Table S5).

Recent evolution in Cambodia
The epidemic curve based on FNRC-ESS data showed a sharp decrease in the number of Salmonella Paratyphi A infections among travellers returning to France from Cambodia between 2013 (n=25) and 2015 (n=5). However, the number of nalidixic acid-resistant isolates increased from 0 % (0/25) in 2013 to 12.5 % (2/16) in 2014 and 80 % (4/5) in 2015. The only nalidixic acid-resistant isolate sequenced here, 201400552, belonged to lineage A and had the gyrA mutation leading to a Ser83Tyr substitution. The question was, therefore, whether the other five recent nalidixic acidresistant isolates belonged to lineage A or to the outbreak C5 clade, with the acquisition of a mutation affecting the QRDR of gyrA. As these five isolates were collected very recently (after the whole-genome study), we developed a SNP genotyping assay to determine whether these isolates belonged to lineage A or clades C2 and C5 of lineage C, which are known to contain nalidixic acid-resistant isolates.
In this SNP assay, the five recent nalidixic acid-resistant isolates were found to belong to the C5 clade and to have the gyrA mutation leading to the Ser83Phe substitution.

Discussion
The analysis of whole-genome sequencing data from isolates unambiguously identified a single predominant strain (C5 clade) among the isolates collected during the recent outbreak of paratyphoid fever in Cambodia, and made it possible to assess the phylogenetic relationships between this strain and other strains circulating in Cambodia or elsewhere in Asia at or before this time. The inclusion of many other publicly available genomes, including, in particular, 80 representative genomes from a global collection of 149 Salmonella Paratyphi A isolates obtained between 1917 and 2009, increased the discriminatory power of the genomic epidemiology analysis, making it possible to define and trace back the outbreak strain.
The genomic analysis showed that the C5 clade was not confined to Cambodia, but was also found in travellers returning from Nepal, Thailand and Vietnam in 2013, suggesting that the outbreak strain may have been circulating beyond the borders of Cambodia. However, these countries did not experience similar outbreaks of Salmonella Paratyphi A infections during the same period. An outbreak of Salmonella Paratyphi A infections was reported in China in 2010-2011, but we found that this outbreak was caused by a strain from a different clade, C4. However, the C4 and C5 clades, separated by a mean of 77 SNPs, shared a common ancestor around 1950.
By contrast, PFGE, the widely-used gold standard method for subtyping Salmonella spp. to the strain level, did not attain the same discriminatory power, despite the presence of a predominant PFGE profile among the outbreak isolates. The presence of other PFGE profiles in the C5 outbreak isolates made the epidemiological analysis harder to interpret. For example, the 11 C5 isolates collected at the SHCH in Phnom Penh during 2012-2013 presented three different PFGE profiles (the main profile, XParA_001, in seven isolates, XParA_011 in three isolates and XParA_010 in one isolate). Furthermore, the predominant XbaI-and BlnI-PFGE profiles, XParA_001 and BParA_001, were also observed in C3 isolates from West Africa. In addition to the weak discriminatory power of this low-throughput fingerprinting method, the PFGE data had low portability. Despite using the same standardized protocol, the figures showing PFGE data in the papers by Saitoh et al. (2016) and Yan et al. (2015) do not contain a molecular size ladder, precluding any comparative analysis between these isolates and our isolates. By contrast, the increasing availability in public databases of short-read sequences from published genomes or of sequences generated during the routine surveillance of Salmonella infections (Ashton et al., 2016) has strongly improved the identification and investigation of outbreaks, particularly those of an international nature.
Unlike the strain responsible for the Chinese outbreak in 2010-2011, the Cambodian outbreak strain was pan-susceptible to the antibiotics tested. This antibiotic susceptibility is of particular importance, as a clear trend towards the acquisition of antibiotic resistance in Salmonella Paratyphi A isolates from Asia has been observed, despite the lack of data from South-East Asia (Gu et al., 2015;Teh et al., 2014). The susceptibility of the outbreak strain to quinolones, and the isolation of a nalidixic acid-resistant C5 strain in Cambodia 11 years before the current outbreak, suggest that neither quinolone resistance nor resistance to any other antibiotic was an important bacterial factor in this outbreak, at least before 2014. Despite a decreasing number of cases between 2013 and 2015, both at the SHCH and in travellers returning from Cambodia to France (Fig. 1), the increase in the isolation of nalidixic acid-resistant C5 isolates from 2014 is worrying, as these isolates also had decreased susceptibility to ciprofloxacin, the first-line treatment in adults. We now need to monitor Salmonella Paratyphi A infections and their susceptibility to antibiotics to determine whether there is a real trend towards an increase in nalidixic acidresistant strains over time. These recent C5 isolates are also being investigated by whole-genome sequencing, to determine whether they form a tight cluster or are distributed evenly throughout the C5 clade, and whether they are derived from the 2013-2014 outbreak strain or from more ancestral C5 populations.
In their comprehensive comparative genomics analysis of a global collection representative of the population structure of this serotype, Zhou and co-workers concluded that the crucial genomic elements of Salmonella Paratyphi A responsible for causing paratyphoid fever were already present in the most common recent ancestor dating from the mid-15th century. Furthermore, they found that almost all genetic changes (gene acquisitions or losses and mutations) since that time had been transient (i.e. removed by purifying selection) (Zhou et al., 2014). These findings are indicative that environmental and/or human behavioural factors enhancing transmission to naive hosts are more likely to explain outbreaks of paratyphoid fever than the recent emergence of a particularly virulent Salmonella Paratyphi A strain. Such environmental and/or human behavioural risk factors for paratyphoid fever have been investigated in casecontrol studies in Asia. In Nepal, different routes of infection were identified for Salmonella Typhi and Salmonella Paratyphi (Karkey et al., 2013). The significant identified risk factors for Salmonella Paratyphi A infections were the consumption of street food during the 2 weeks preceding the onset of illness and residence in the capital city, Kathmandu, for less than 2 years. In Indonesia, Salmonella Typhi and Salmonella Paratyphi A were also found to follow different routes of infection (Vollaard et al., 2004). The risk factors associated with a Salmonella Paratyphi A infection were identified as the consumption of food from street vendors and flooding. In South China, during a community outbreak of 600 cases, the main risk factor was the consumption of uncooked vegetables from street stalls, local markets or small restaurants (Yan et al., 2015). The primary source was a vegetable patch irrigated with untreated hospital wastewater.
These findings highlight the need for solid public-health investigations and interventions, together with food-safety regulations, in Cambodia. Significant steps in terms of food safety were taken in Cambodia in 2015, when a draft of the country's first food law was released.
For future public health investigations, the use of wholegenome sequencing data in real-time, together with classical epidemiological methods, would be advantageous, as demonstrated here by the unequalled discriminatory power of genomic data for tracking individual strains (i.e. distinguishing outbreak-associated isolates from those of contemporary sporadic cases and assessing the true genetic relationships between isolates) from this genetically monomorphic bacterial population. The collection of epidemiological and molecular data should be a collaborative effort, to maximise the gain from combining the two types of data. Efforts should also be made, during initial investigations, to isolate strains from suspected water and/or food sources and from identified carriers. This should make it possible to optimise the use of high resolution provided by wholegenome sequencing and to demonstrate a clear link with the suspected source, as demonstrated by Yan et al. (2015) during the outbreak in China in 2010-2011.