Phylogenomic analysis of Clostridium perfringens identifies isogenic strains in gastroenteritis outbreaks, and novel virulence-related features

Clostridium perfringens is a major enteric pathogen known to cause gastroenteritis in human adults. Although major outbreak cases are frequently reported, limited Whole Genome Sequencing (WGS) based studies have been performed to understand the genomic epidemiology and virulence gene content of C. perfringens-associated outbreak strains. We performed both genomic and phylogenetic analysis on 109 C. perfringens strains (human and food) isolated from disease cases in England and Wales between 2011-2017. Initial findings highlighted the enhanced discriminatory power of WGS in profiling outbreak C. perfringens strains, when compared to the current Public Health England referencing laboratory technique of Fluorescent Amplified Fragment Length Polymorphism (fAFLP). Further analysis identified that isogenic C. perfringens strains were associated with nine distinct care home-associated outbreaks over the course of a 5-year interval, indicating a potential common source linked to these outbreaks or transmission over time and space. As expected the enterotoxin CPE gene was encoded in all but 4 isolates (96.4%; 105/109), with virulence plasmids encoding cpe (particularly pCPF5603- and pCPF4969-family plasmids) extensively distributed (82.6%;90/109). Genes encoding accessory virulence factors, such as beta-2 toxin, were commonly detected (46.7%; 50/109), and genes encoding phage proteins were also frequently identified, with additional analysis indicating their contribution to increased virulence determinants within the genomes of gastroenteritis-associated C. perfringens. Overall this large-scale genomic study of gastroenteritis-associated C. perfringens suggested that 3 major sub-types underlie these outbreaks: strains carrying (1) pCPF5603 plasmid, (2) pCPF4969 plasmid, and (3) strains carrying cpe on transposable element Tn5565(usually integrated into chromosome). Our findings indicate that further studies will be required to fully probe this enteric pathogen, particularly in relation to developing intervention and prevention strategies to reduce food poisoning disease burden in vulnerable patients, such as the elderly.


Introduction
been reported in elderly patients, especially those residing in care homes in the North East of 70 England between 2012-2014 (83% of the outbreaks reported from care homes) (10). 71 Although fatality due to C. perfringens diarrhoea is uncommon and hospitalisation rate is 72 low, enterotoxigenic C. perfringens is reported to cause ~55 deaths/year in England and 73 Wales according to the UK Food Standards Agency (15,16). 74 The newly expanded and revised toxinotyping scheme classifies C. perfringens into 7 75 toxinotypes (type A-G) according to the combination of typing toxins produced, with this 76 classification used in this article (17). Human cases of C. perfringens diarrhoea are primarily 77 caused by type F strains (formerly classified as enterotoxigenic type A), which produce 78 enterotoxin (CPE), encoded by the cpe gene (18). This potent pore-forming toxin is reported 79 to disrupt intestinal tight junction barriers, which is associated with intestinal disease 80 symptoms (19). C. perfringens, and associated encoded toxins, have been extensively studied 81 with respect to disease pathogenesis, with a strong focus on animal infections (20-24). Recent 82 studies analysing a range of diverse C. perfringens strains (from both animal and human-83 associated infections) indicates a plastic and divergent pangenome, with a significant 84 proportion of accessory genes predicted to be involved in virulence mechanisms and 85 metabolisms, linked to enhanced host colonisation and disease initiation (3,25). However, 86 studies describing human outbreak-associated C. perfringens infections are limited, and to 87 date only one recent study (58 isolates) has utilised Whole Genome Sequencing (WGS) data 88 to probe the genomic epidemiology of associated strains (3,26). 89 We have applied in-depth genomics and phylogenetic analyses to whole genome sequences 90 of 109 newly-sequenced C. perfringens isolates associated with outbreaks or incidents of C. 91 perfringens diarrhoea in England and Wales, either foodborne or non-foodborne-derived. We 92 also identified distribution of known virulence-related determinants including toxin and 93 antimicrobial resistance (AMR) genes, virulence-associated plasmid contents within food and 94 rhierbaps v1.0.1 (44,45). The pangenome was visualised in Phandango while plots were 166 generated using the associated R scripts in the Roary package (46). 167

Profiling virulence and plasmid-related sequences 168
Screening of toxin and AMR gene profiles, IS elements and plasmid tcp loci were performed 169 via ABRicate with 90% identity and 90% coverage minimum cutoffs to infer identical genes 170 based on a custom toxin database and the CARD database v2.0.0 (AMR) as described 171 previously (47,48). ARIBA v2.8.1 was used as a secondary approach to confirm detections 172 of both toxin and AMR genes in raw sequence FASTQ files (49). Heat maps were generated 173 in R using gplots and function heatmap. 2 (50, 51). 174

In silico plasmid analysis 175
Sequencing reads were utilised for computational plasmid prediction via software 176 PlasmidSeeker v1.0 (52). Plasmid prediction was based on 8 514 plasmid sequences available 177 in NCBI Reference Sequence databases (RefSeq; including 35 C. perfringens plasmids, see 178 Table S4). All reads were searched for matching k-mers at k-mer length of 20 and screening 179 cutoff at P-value 0.05 based on FASTQ reads. The top predicted plasmids appearing in each 180 'cluster' (with highest k-mer identity; k-mer percentage ≥80% as the minimum cutoff) were 181 extracted as predicted plasmids (Table S5) (Table S6). Annotated GenBank files were submitted manually to the PHASTER 201 web server and annotated data parsed with in-house scripts. The detection of phage was based 202 on the scoring method and classification as described previously (56)

Whole-genome based phylogenetic analysis reveals potential epidemiological clusters 213
Initially we analysed the population structure of all strains sequenced. We defined general 214 food poisoning isolates as Food Poisoning (FP, n =74), and care home specific isolates as 215 Care Home (CH, n =35) (Fig. 1A-B). Quality of the genomic assemblies of draft genomes 216 was also determined ( Fig. 1C; Table S2), with >70% of the isolate assemblies <200 contigs. 217 Separate analysis of CH isolates indicated four distinct phylogenetic lineages relating to care 218 home outbreaks ( Fig. 2A). Lineage I contained the reference genome NCTC8239, a historical 219 cpe-positive isolate (originally isolated from salt beef) associated with a FP outbreak, and 220 three newly sequenced strains (7). The remaining isolates clustered within three lineages (i.e. 221 II, III and IV), that were divergent from lineage I indicating these CH isolates might be 222 genetically distinct from typical FP isolates as in Fig. 1A. Further analysis indicated that 18 223 closely-related strains obtained from 9 different outbreaks between 2013-2017 which 224 occurred in the North East England, clustered within the same IVc sub-lineage (Fig. 2B). 225 SNP investigation on these IVc isolates determined within-sub-lineage pairwise genetic 226 distances of <80 SNPs (29.9 ± 16.6 SNPs; mean ± S.D.; Table S7), suggesting a close 227 epidemiological link. Isolates associated with specific outbreaks within sub-lineage IVc (i.e. 228 outbreaks 2, 7, 8, 9 and 10) showed very narrow pairwise genetic distances <20 SNPs (6.6 ± 229 6.6 SNPs; mean ± S.D.; Fig. 2C), suggesting potential involvement of an isogenic strain 230 (genetically highly similar) within these individual care home outbreaks (although a number 231 of genetically dissimilar strains were also isolated from outbreaks 1, 2, 3, 6, 7 and 8 as shown 232 in Fig. 2A). 233 This WGS analysis was also shown to have greater discriminatory power than the currently 234 used fAFLP. The fAFLP typing (type CLP 63, yellow-coded) failed to discriminate isolates 235 from 6 different outbreaks (CH outbreaks 2-7; Fig. 2A), while SNP analysis clearly 236 distinguished these strains ( Fig. 2B) (59). 237 Analysis of FP isolates indicated clear separation between linages (Fig. 3A), particularly 238 between lineage I, and remaining lineages II-VII (pairwise mean SNP distance lineage I vs 239 lineages II-VII: 35165 ± 492 SNPs; within lineage I: 5684 ± 2498 SNPs; within lineages II-240 VII: 13542 ± 8675 SNPs). Isolates from three individual foodborne-outbreaks within lineage 241 VII appear to be highly similar even though these isolates demonstrated geographical 242 heterogeneity (Fig. 3A), and further analysis indicated two different outbreaks that occurred 243 in London (2013) were related, but somewhat distinct from isolates obtained in North East 244 England (2015) outbreaks (Fig. 3B). This suggests a geographical separation of a common 245 ancestor at an earlier time point and may also indicate the potential widespread distribution of 246 a genetically-related strain. 247 Isolates from individual FP outbreaks also appeared to be clonal and isogenic, as pairwise 248 genetic distances were between 0-21 SNPs (mean genetic distance: 2.6 ± 2.7 SNPs;  Table S8), when compared to same-lineage-between-outbreaks SNP distances of >1200 250 SNPs (Fig. 3D). In addition, outbreak-associated food source isolates were not 251 distinguishable from human clinical isolates (genetically similar, pair-wise SNP range: 0-16 252 SNPs) in 7 individual FP outbreaks (Fig. 3A). These findings are consistent with the 253 hypothesis that contaminated food is the main source of these C. perfringens food poisoning 254 outbreaks, which included all meat-based food stuffs e.g. cooked sliced beef, lamb, chicken 255 curry, cooked turkey and cooked meat ( Table S1). 256

Specific plasmid-associated lineages and potential CPE-plasmid transmission 295
The CPE toxin is responsible for the symptoms of diarrhoea in food poisoning, and non-296 foodborne illnesses, in the latter usually lasting >3 days, and up to several weeks (2, 66). 297 Genetically, whilst chromosomal encoded cpe strains are primarily linked to food-poisoning 298 (67, 68), non-foodborne diarrhoea is usually associated with plasmid-borne cpe strains (66, 299 69, 70). We performed an in-depth plasmid prediction on our datasets and analysis indicated 300 that CH isolates predominantly harboured pCPF5603 plasmids (34/35 isolates; 97%) 301 encoding cpb2 and cpe genes, whilst FP isolates carried primarily pCPF4969 plasmids (45/75 302 isolates; 60%) encoding cpe but not cpb2 genes (Fig. S1J). We also performed a genome-303 wide plasmid-specific sequence search to confirm our findings including IS1151 304 (pCPF5603), IS1470-like (pCPF4969) and plasmid conjugative system tcp genes (Fig. 4A-B) 305 (71-73). 306 To further examine and confirm the predicted plasmids, we extracted plasmid sequences 307 (complete unassembled single contig) from three isolates per CH or FP group, and compared 308 with reference plasmids (Fig. 5A-C). The extracted plasmid sequences closely resembled the 309 respective reference plasmids, with near-identical nucleotide identity (>99.0%), plasmid size 310 and GC content ( Fig. 5B-C; Table S9), thus supporting the findings that these two intact 311 plasmids (pCPF4969 and pCPF5603) are present in these isolates. 312 Although chromosomal-cpe strains are considered as the primary strain type to be associated 313 with food poisoning, our dataset demonstrated that plasmid-cpe C. perfringens strains were 314 predominantly associated with food poisoning (82.6%; 90/109), with only 17.4% FP isolates 315 encoding a copy of cpe on the chromosome (no plasmid detected). Putatively, plasmid 316 transfer may have occurred in CH outbreak 7 isolates (n=4; PH030, PH031, PH032, PH033), 317 as two isolates reside within lineage IV, whilst the other two isolates nest within the 318 genetically distant CH lineage I (genetic distance >10 000 SNPs), however all 4 isolates 319 harboured plasmid pCPF5603 (Fig. 4A). CH outbreaks 1 and 8 also had dissimilar strains 320 (nested within separate lineages) with identical plasmids. This analysis denotes that multiple 321 distinct strains, but carrying the same cpe-plasmid, may be implicated in these CH outbreaks, 322 with previous work showing in vitro plasmid transfer among C. perfringens strains via 323 conjugation (cpe-positive to cpe-negative strains) (74). 324 Previous studies have demonstrated that C. perfringens with chromosomally-encoded cpe are 325 genetically divergent from plasmid-cpe carriers. Within the FP phylogeny there was a distinct 326 lineage of isolates (lineage I; n=17) that appear to encode cpe chromosomally. These isolates 327 had significantly smaller genomes (genome size 2.95 ± 0.03 Mb vs 3.39 ± 0.08 Mb outside-328 lineage; n=93; P<0.0001; Table S10), were most similar to reference genome NCTC8239 329 (ANI≥99.40%) and appeared to lack plasmids. This was further evidenced by historical 330 chromosomal-cpe strain NCTC8239 nesting in this lineage with these newly sequenced 331 strains (FP lineage I; Fig. 4) (70,72,75). To further investigate this hypothesis, the cpe-332 encoding region (complete single contig from high-coverage assemblies) was extracted from 333 representative isolates in lineage I (n=6), and comparative genomics was performed (Fig.  334   5D). These consistently smaller (~4.0-4.3 kb) contigs were almost identical in nucleotide 335 identity (>99.9%) when compared with the cpe-encoding region of chromosomal-cpe strain 336 NCTC8239, confirming that these isolates possessed the same cpe genomic architecture as 337 NCTC8239 and confirmed as transposable element Tn5565 (Fig. 5E). In addition, PH029 338 was the only outbreak isolate not detected to encode cpe within the lineage I outbreak cluster, 339 despite having a clonal relationship with PH028, PH104, PH105 and PH107 (FP outbreak 4; 340 Fig. 4B). This suggests Tn5565 loss may have occurred due to extensive sub-culturing (this is 341 supported by initial PCR results being cpe-positive; see Table S1). Analysis also indicates 342 that cpe was closely associated with IS1469 independent of where it was encoded, as this 343 insertion sequence was detected exclusively in all cpe-encoding genomes (100%; Fig. 4A-B). 344

Accessory genome virulence potentials 345
The 110 strain C. perfringens pangenome consisted of 6 219 genes (including NCTC8239; 346 prophages could potentially contribute to virulence, given the plasticity of the genome. To 350 explore this in more detail, we further analysed the accessory genomes, comparing different 351 sub-sets of C. perfringens isolates. We first identified subset-specific genes using a bacterial 352 pan-GWAS approach, with these genes further annotated based on NCBI RefSeq gene 353 annotations and categorised under COG classes into three comparison groups: (1) CH vs FP; 354 (2) FP outbreaks; (3) FP lineage I, FP lineage II-VII and CH-FP plasmid-CPE isolates (Fig.  355

S3A-C). 356
Phosphotransferase system (PTS)-related genes (n=4) were encoded exclusively in CH 357 isolates (present in 26/35 CH isolates; Fig. S3A and Table S11). These genes may contribute 358 to the isolates' fitness to utilise complex carbohydrates (COG category G) in competitive 359 niches, like the gastrointestinal tract (76). PTS genes have been linked to virulence regulation 360 in other pathogens including foodborne pathogen Listeria monocytogenes (77). Heat-shock 361 protein (Hsp70) DnaK co-chaperone was annotated in FP-specific accessory genome (present 362 in 57/74 FP isolates), which may be involved in capsule and pili formation which may 363 facilitate host colonisation (78-80). 364 Accessory genes specific to each FP outbreaks were variable (Fig. S3B-C  (n=9) (L) and integrases (n=8) (L). It was evident that most genes were associated with 370 phages, seemingly a major source of mobile genetic transfer. 371 Correspondingly, less group-specific accessory genes where present compared with other 372 isolates in lineages II-VII ( Fig. S3C and Table S13). Notably, multidrug transporter 'small 373 multidrug resistance' genes were exclusively detected in FP lineage I isolates, whereas ABC 374 transporters were more commonly encoded in plasmid-carrying isolates (virulence plasmids 375 pCPF5603 and pCPF4969 carry various ABC transporter genes). The Mate efflux family 376 protein gene was detected solely in lineage II-VII isolates. 377

Prophage genomes linked to enhanced C. perfringens fitness 378
Phage are important drivers of bacterial evolution and adaptation, and presence of prophage 379 within bacterial genomes is often associated with enhanced survival and virulence e.g. 380 sporulation capacity and toxin secretion (81-83). Thus, mining phages in foodborne C. 381 perfringens genomes could reveal insights into the role of bacteriophage in modulating 382 diversity and pathogenesis traits (25). We identified through PHASTER a total of 7 383 prophages in all 109 genomes (Fig. S4A-B). Further exploration into virulence and survival-384 enhancing genes (Fig. S4C) encoded in these predicted prophage regions revealed the 385 presence of virulence-related enzyme sialidase NanH (promotes colonisation), putative 386 enterotoxin EntB, various ABC transporters (linked to multidrug resistance) and toxin-linked 387 phage lysis holin (probable link to toxin secretion)(61, 84-86). No differences in number of 388 prophages carried were detected between CH and FP isolates (Fig. S4D-E). These data 389 suggested that phages could potentially contribute to increased accessory virulence within the 390 genomes of food-poisoning associated C. perfringens, and indicates further research, using 391 both experimental and genomic approaches, is required. 392 393 Discussion 394 C. perfringens is often associated with self-limiting or longer-term gastroenteritis, however 395 our knowledge on the genomic components that may link to disease symptoms or 396 epidemiological comparisons between outbreaks is limited. In this study, WGS data and in-397 depth genomic analysis on a representative sub-set of 109 gastrointestinal outbreak-398 associated C. perfringens isolates, revealed potential epidemic phylogenetic clusters linked to 399 plasmid carriage, and specific virulence determinants, which were strongly associated with 400 outbreak isolates. 401 In the context of disease control it is important to gain detailed genomic information to 402 predict transmission modes for pathogens. Our analysis of care home isolates indicated a 403 specific persistent clone may have been responsible for up to 9 individual gastrointestinal 404 outbreaks in North East England over the 2013-2017 period, which represents the majority 405 reported gastroenteritis outbreaks (>80%) in this area (10). Interestingly, a previous study 406 indicated presence of persistent identical C. perfringens genotypes within care home settings, 407 with several individuals harbouring identical strains (as determined via PFGE profiling) 408 throughout a 9-month sampling period, however none of these isolates were positive for 409 the cpe gene (87). Furthermore, although care home isolates were defined as 'non-foodborne' 410 according to local epidemiological investigations as no food samples were identified as C. Successful colonisation of invading C. perfringens is required for efficient toxin production, 423 which ultimately leads to gastrointestinal symptoms. Through computational analysis we 424 determined that plasmid-cpe (specifically plasmids pCPF4969 and pCPF5603) carrying 425 strains predominated within both non-foodborne and food-poisoning outbreak-related C. 426 perfringens isolates (~82%). These two virulence plasmids, pCPF4969 and pCPF5603, 427 encoded several important virulence genes including ABC transporter and adhesin (also 428 known as collagen adhesion gene cna) that could contribute to enhanced survival and 429 colonisation potential of C. perfringens within the gastrointestinal tract (90). Plasmid 430 pCPF4969 also contained a putative bacteriocin gene that may allow C. perfringens to 431 outcompete other resident microbiota members, and thus overgrow and cause disease in the 432 gut environment (73). Plasmid pCPF5603 encoded important toxin genes cpe and cpb2, plus 433 additional toxins, many of which are linked to food poisoning symptoms, such as diarrhoea 434 and cramping. 435 Interestingly, 4 out of 109 outbreak-associated strains were cpe-negative, suggesting 436 secondary virulence genes (e.g. pfo and cpb2) may be associated with C. perfringens-437 associated gastroenteritis. A recent WGS-based study on FP C. perfringens outbreaks in 438 France determined that ~30% of isolates were cpe-negative (13/42) (26), indicating this gene 439 may not be the sole virulence determinant linked to C. perfringens gastroenteritis. Although 440 we observed less cpe-negative strains in our collection in comparison to this study, this may 441 be due to our targeted cpe-positive isolation strategy (standard practice at PHE). Thus, to 442 determine the importance and diversity of cpe-negative strains in FP outbreaks this will 443 require untargeted isolation schemes in the future. 444 Typical C. perfringens-associated food poisoning was previously thought to be primarily 445 caused by chromosomal-cpe strains. This is linked to their phenotypic capacity to withstand 446 high temperatures (via production of a protective small acid soluble protein), and high salt 447 concentrations during the cooking process, in addition to the shorter generation time, when 448 compared to plasmid-cpe carrying strains (68, 91). Previous studies have indicated that these 449 strains commonly assemble into distinct clusters that lack the pfo gene, which we also noted 450 in the FP lineage I data from this study (26,(92)(93)(94). Nevertheless, plasmid-borne cpe-carrying 451 strains (pCPF4969 or pCPF5603) have also been associated with previous food poisoning 452 outbreaks, with a previous study indicating that pCPF5603-carrying strains (encoding 453 IS1151) were associated with food poisoning in Japanese nursing homes (7 out of 9 isolates) 454 (71). However, these plasmid-cpe outbreaks have been described as a relatively uncommon 455 occurrence, thus it is surprising that our findings indicate that most outbreak isolated strains 456 (81.6%; 89/109) carried a cpe-plasmid (60, 67). The fact that plasmid-cpe strains can cause 457 diverse symptoms including short-lived food poisoning, and long-lasting non-foodborne 458 diarrhoea, implicates additional factors in disease pathogenesis. The gut microbiome may be 459 one such host factor as previous studies have reported that care home residents have a less 460 diverse and robust microbiota when compared to those residing in their own homes 461 (including individuals colonised with C. perfringens), and thus impaired 'colonisation 462 resistance' may mean certain C. perfringens strains can overcome these anti-infection 463 mechanisms and initiate disease pathogenesis (64,87,95). 464 Chromosomal-cpe is reported to be encoded on a transposon-like element Tn5565 (6.3 kb, 465 with flanking copies of IS1470), which can form an independent and stable circular-form in 466 culture extracts (losing both copies of IS1470) (72, 74). This transposition element TN5565 467 was commonly thought to be integrated into the chromosome at a specific site as a unit. The 468 fact that our computational analysis failed to detect any cpe, IS1469 (cpe-specific), and 469 IS1470 (Tn5565-specific) in the high-sequencing-coverage PH029 genome (317X 470 sequencing depth/coverage) indicates that Tn5565 can be lost or may be passed on to other C. 471 perfringens cells. However, it should be noted that flanking IS1470 of Tn5565 may not have 472 been correctly assembled during the genome assembly process due to the repetitive nature of 473 those sequences (short-read sequencing). 474 As WGS provides enhanced resolution to identify outbreak-specific clonal strains, our study 475 highlighted the importance of implementing WGS for C. perfringens profiling in reference 476 laboratories, in place of the conventional fAFLP (92, 93, 96). Routine C. perfringens 477 surveillance of the care home environment and staff could prove critical for vulnerable 478 populations, as outbreaks could rapidly spread, and this approach could potentially pinpoint 479 the sources of contamination, and eventually eliminate persistent cpe-strains in the 480 environment (87). In light of the potential rapid transmissibility of C. perfringens cpe-strains 481 responsible for food-poisoning outbreaks, real-time portable sequencing approaches such as 482 the MinION, could facilitate the rapid identification of outbreak strains, which has been 483 recently been reported to identify outbreak Salmonella strains in <2h (97, 98). 484 Our data highlights the genotypic and epidemiology relatedness of a large collection of C. 485 perfringens strains isolated from food poisoning cases from across England and Wales,and 486 indicates potential circulation of disease-associated strains, and the potential impact of 487 plasmid-associated-cpe dissemination, linked to outbreak cases. This study indicates that 488 further WGS phylogenetic and surveillance studies of diversely-sourced C. perfringens 489 isolates are required for us to fully understand the potential reservoir of food poisoning-490 associated strains, so that intervention or prevention measures can be devised to prevent the 491 spread of epidemiologically important genotypes, particularly in vulnerable communities, 492 including older adults residing in care homes.