A comparison of fourteen fully characterized mammalian-associated Campylobacter fetus isolates suggests that loss of defense mechanisms contribute to high genomic plasticity and subspecies evolution

Campylobacter fetus is currently classiﬁed into three main subspecies, but only two of these, C. fetus subspecies fetus and C. fetus subsp. v enerealis originate principally from ruminants where they inhabit diﬀerent niches and cause distinct pathogenicity. Their importance as pathogens in international trade and reporting is also diﬀerent yet the criteria deﬁning these properties have never been fully substantiated nor understood. The situation is further compromised because the ability to diﬀerentiate between these two closely related C. fetus subspecies has traditionally been performed by phenotypic characterisation of isolates, methods which are limited in scope, time-consuming, tedious, and often yield inconsistent results, thereby leading to isolate misidentiﬁcation. The development of robust genetic markers that could enable rapid discrimination between C. fetus subsp. fetus and subsp. venerealis has also been challenging due to limited diﬀerences in the gene complement of their genomes, high levels of sequence repetition, the small number of closed genome sequences available and the lack of standardisation of the discriminatory biochemical tests employed for comparative purposes. To yield a better understanding of the genomic diﬀerences that deﬁne these C. fetus strains, seven isolates were exhaustively characterised phenotypically and genetically and compared with seven previously well characterised isolates. Analysis of these 14 C. fetus samples clearly illustrated that adaption by C. fetus subsp. venerealis to the bovine reproductive tract correlated with increasing genome length and plasticity due to the acquisition and propagation of several Abstract Campylobacter fetus is currently classified into three main subspecies, but only two of these, C. fetus subspecies fetus and C. fetus subsp. v enerealis originate principally from ruminants where they inhabit different niches and cause distinct pathogenicity. Their importance as pathogens in 27 international trade and reporting is also different yet the criteria defining these properties have 28 never been fully substantiated nor understood. The situation is further compromised because the ability to differentiate between these two closely related C. fetus subspecies has traditionally 30 been performed by phenotypic characterisation of isolates, methods which are limited in scope, 31 time-consuming, tedious, and often yield inconsistent results, thereby leading to isolate misidentification. The development of robust genetic markers that could enable rapid 33 discrimination between C. fetus subsp. fetus and subsp. venerealis has also been challenging due 34 to limited differences in the gene complement of their genomes, high levels of sequence 35 repetition, the small number of closed genome sequences available and the lack of 36 standardisation of the discriminatory biochemical tests employed for comparative purposes. To yield a better understanding of the genomic differences that define these C. fetus strains, seven isolates were exhaustively characterised phenotypically and genetically and compared with seven previously well characterised isolates. Analysis of these 14 C. fetus samples clearly illustrated that adaption by C. fetus subsp. venerealis to the bovine reproductive tract correlated with increasing genome length and plasticity due to the acquisition and propagation of several mobile elements including prophages, transposons and plasmids harbouring virulence factors. the the CRISPR regularly interspersed short palindromic an impact. These data will facilitate future studies to better understand the precise genetic differences that underlie the phenotypic and virulence differences between these animal pathogens and may identify additional markers useful for diagnosis and sub-typing.

In an effort to better understand other aspects associated with CFF versus CFV, detailed 151 comparisons of the genome organisation of CFF and CFV isolates have been undertaken using 152 both draft and closed chromosome sequence information (Ali et al. 2012; Kienesberger et al. 153 2014; van der Graaf-van Bloois et al. 2014). These studies concluded that all mammalian-154 associated CFs share extensive genome synteny and vary primarily due to the presence of 155 distinct pathogenicity islands dotted throughout the chromosome. While these studies clearly 156 indicated the high concordance of the genome complement in all isolates, the organisation of this 157 genetic information can be revealed only through comparison of closed genomes. Unfortunately, 158 the closed genome comparisons reported to date have typically involved only one sub-species 159 member in each case, and thus lack consideration of the range of genomic diversity within each 160 sub-species. Challenges in generating complete, polished and annotated genomes for these 161 organisms are due to their high AT content and large numbers of repetitive and mobile elements. 162 As a result, most of the available genomic data for CF isolates is at the draft genome level only. 163 This report describes the use of multiple genomics methods and tools to generate closed genome 164 sequences for seven phenotypically well characterised C. fetus isolates representative of the two 165 mammalian-associated subspecies and their biovars. Comparison of these genomes with an 166 additional seven complete and polished genomes recovered from the NCBI database illustrates 167 that, with a few exceptions, all have a similar gene complement. However, their level of genome 241 on an Illumina Hiseq 2000 sequencer (2 x 300 except for CFF02A725 and CFV08A1102 which 242 were run earlier using 2 x 100 reagent kits) at the Michael Smith Genome Sciences Centre (BC, 243 Canada). Assemblies were initially generated and reviewed using the Lasergene Genomics suite 244 software v14 (DNASTAR Inc., Madison, WI) in an iterative process to fill gaps and remove 245 small errors introduced by the PacBio sequencing; Illumina reads added > 100 average base 246 coverage. Small gaps in coverage for isolates CFF02A725 and CFV08A1102 were filled by 247 Sanger sequencing of PCR products. Final assemblies were confirmed by comparison with the 248 optical maps. To further confirm these genome assemblies and facilitate the assembly of any 249 plasmids harbored by these isolates long read sequencing was performed on barcoded samples 250 using a rapid sequencing kit run on a R9.4 MinION flow cell (Oxford Nanopore, Cambridge, 251 MA). The combined sequence data from all platforms (Illumina paired-end, Pac-Bio and 252 Nanopore reads) were assembled using Unicycler (Wick et al. 2017). Briefly, this software 253 employs the SPAdes tool for de novo assembly of the short-read data and then uses the long 254 reads to bridge gaps in these assemblies. The bridged assemblies then underwent multiple rounds 255 of short-read polishing.
256 Annotation and genome interrogation 257 Chromosomal and plasmid sequences were annotated using Prokka (Seemann 2014) with further 258 refinement using an in-house script (Duceppe 2019) that uses a clustering algorithm to improve 259 predicted annotation descriptions. Due to variation in annotation designations for some genes the 260 identification of comparable protein products in different isolates was, in some cases, performed 261 by cross searching several isolates with the predicted protein products. CRISPR loci were 262 identified using the web-based CRISPRFinder tool (Grissa et al. 2007). All genome alignments 263 were generated using the progressive Mauve option of Mauve version 2.3.1 (Darling et al. 2004) 264 using previously characterised sequences as reference.

Results
266 Phenotypic analysis 267 This study compares fourteen representative mammalian-associated C. fetus isolates of which 12 268 originated from bovines while the remaining two CFFs are of human and ovine origin (Table 1). 269 Seven of these isolates, which are described in this report for the first time, were all identified as 270 C. fetus using a specific monoclonal antibody capture ELISA test and were found to possess 271 serotype A heat stable lipopolysaccharide (LPS) antigens. Details of their provenance (Table 1) 272 and the results of their phenotypic testing (Table S1) are presented. Based on the ability for 273 growth on media with >1% glycine, generation of hydrogen sulphide in cysteine rich medium, 274 the ability to reduce selenite and grow at 42°C, three isolates scored as positive in either three or 275 four tests and were identified as CFF. Two isolates which scored as negative by all four tests 276 were classified as CFV while the remaining two isolates were identified as CFVi based on 277 intermediate results that were shared with both CFV (inability to grow in broth with > 1% 278 glycine and at 42C) and CFF (production of H 2 S and reduction of selenite). However, it is 279 evident from the results shown in Table S1 that a single biochemical test is sometimes 280 insufficient to yield an accurate sub-type classification. Phenotypic analysis supporting the 281 subspecies and biovar classification of the additional seven C. fetus isolates, as summarised in 282 Table 1, has been reported previously (van der Graaf-van Bloois et al. 2014). Since the focus of 283 this report was to compare bovine isolates of the different sub-species, 284 Genome sequencing and assembly 285 All seven newly characterised isolates were analysed by long (PacBio) and short (Illumina) read 286 sequencing; all raw data, available in Genbank (Supplemental Table S2), was used for generation 287 of closed chromosomal sequences as described. To assist with and confirm the accuracy of these 288 assemblies each genomic sequence was used to generate a NcoI restriction map in silico for 289 comparison with an experimentally generated optical map for each isolate (Supplemental Fig  290 S1). These data indicate excellent concordance between both maps for all seven isolates, the non-291 aligned regions at the termini being a result of incomplete alignment of these circular 292 chromosomes by a linear format. There was one small inconsistency in isolate CFF09A980 293 (Supplemental Fig S1) which could not be resolved, though the sizes of the fragments in this 294 region of the genome did not appear to differ significantly between the optical map and the 295 sequence read data. The later addition of MinIon sequence data (Supplemental Table S2) further 296 confirmed these genome assemblies.  Table 2. Overall the genome size appeared to reflect the subspecies and 301 biovar classification with all CFF strains having the smallest genomes, of approximately 1.8 Mb, 302 while the genomes of the CFV and CFVi strains were larger, generally between 1.9 and 2 Mb 303 with the exception of isolate CFVi03-293 which was a little smaller at ~1.87 Mb. GC content 304 was consistent at 33.2 -33.3% except for two CFVi isolates which had a slightly higher content. 305 Chromosomal alignments were generated to compare the overall organisation of these 14 CF 306 genomes. Fig 1 shows a comparison of the overall genome organisation for three representative 307 CFs, one each of CFF, CFV and CFVi while further comparisons between members of each 308 subspecies and biovar are shown in Figs. 2-4. The alignment presented in Fig. 1 indicates an 309 overall similar genome organisation for all three CFs but with two main observations: a 310 hypervariable region, located variously between bases 450,000 to 600,000, following a well 311 conserved block of sequence and increasing length of the genome from CFF to CFV and CFVi 312 members due to multiple insertions. 313 The six CFF isolates (Fig. 2) exhibited a high level of synteny with significant differences 314 limited to a few relatively small regions. The sequences corresponding to residues 435,000 and 315 485,000 of the reference strain (CFF-04-554) were the most variable between all isolates and 316 included significant rearrangements and gene inversions of blocks of sequence. There was also a 317 variable sequence of ~32 kb (corresponding to the white/pale orange block at bases 1,100,000-318 1,113,200 of CFF04-554) which contained elements representative of a prophage that appeared 319 in different locations in each isolate. Additional variations were observed in specific isolates 320 including insertions of ~45.4 Kb (1,203,385-1,248,755) in CFF-09A980 (described further 321 below) and one of ~30 Kb (650,000-680,000) in CFF 04-554 which encoded many hypothetical 322 proteins and several different Fic protein alleles. 323 The alignment of five CFV genomes (Fig. 3) again identified a hypervariable region 324 (corresponding to residues 532,000-616,675 of CFV-84-112) but it also revealed greater 325 variability in other genomic regions amongst this group than was observed for the CFFs. 326 Compared to the other CFVs, the CFV-84-112 genome included a transposition of a block of 327 sequence of about 95Kb; the region affected (bases 142,200-237,000), which was bound by 328 rRNA small and large subunit genes, corresponded to residues 544,240-638,200 of the other 329 genomes. This genomic region encoded several membrane and periplasmic proteins, several of 330 which had either signal transduction or transporter functions, including heavy metal 331 transportation, products involved in heavy metal resistance and several enzymes involved in 332 multiple metabolic pathways. A large genomic segment of CFV-84-112 corresponding to 333 residues 761,700-1,275,130 was inverted compared to the other genomes. A 22Kb sequence, 334 representative of a genomic island harbouring Type IV secretion system genes (see below), 335 located close to the downstream terminus was inverted in both CFV-97-608 and CFV-08A948 336 compared to the other three CFVs. 337 Yet additional variability was evident amongst the three CFVi isolates (Fig. 4). Indeed, when, as 338 per convention, the dnaA gene is placed at the start sequence, CFVi-ADRI1362 exhibited an 339 inversion of a large section of the genome compared to the other CFVi isolates such that genome 340 co-ordinates were significantly altered, though in reality this difference could be visualised as an 341 inversion of a smaller genome segment of ~ 405,000 bp. CFVi-ADRI545 exhibited a genome 342 organisation closer to that of the reference genome but with several notable differences. The 343 hypervariable region of this isolate was shifted downstream due to two insertions; indeed, this 344 sample contained no less than six insertions relative to the reference as well as an inversion of 345 residues 894,000-1,165,000 compared to the reference, a feature also shared by CFVi-346 ADRI1362. The features contributing to these differences are detailed further below.
347 Hypervariable region 348 Annotation information for all 14 genomes facilitated comparison of their gene complement 349 including a detailed comparison of the hypervariable region described above and recognised 350 previously as a locus encompassing multiple sap gene alleles (Tu et al. 2003). Comparison of the 351 organisation of this region is shown in Fig 5 for 13 CFs, CFV-08A948 and CFV-08A1102 being 352 virtually identical. This region includes two sets of gene groupings retained in virtually all 353 isolates but varied in order and orientation. One set includes the genes tolC (an outer membrane 354 protein), prs and prsD, which encode products involved in the Type 1 secretion system, and a 355 presumed peptidase. Notably this gene group was absent in the single ovine isolate CFF-356 NCTC10842 but it is unknown if this is typical of ovine isolates. The second set of 11 genes, 357 present in all cases, is bound by the ssrA gene encoding a tmRNA and the ybeY gene that 358 encodes an endoribonuclease, except for CFV-84-112 in which the ssrA gene is missing. 359 Between these two loci are genes encoding three transferases, a rhodanese, the acid membrane 360 antigen A and four genes (mlaB, mlaD, mlaE and MlaF) all believed to be involved in lipid 361 transport. In CFF-82-40 the mlaE locus is identified as a pseudo gene. Interspersed around these 362 two gene groupings are multiple copies of the surface array protein genes, sapA for all 12 363 serogroup A isolates and sapB for the two serogroup B isolates, CFF-04-554 and CFF-364 NCTC10842. Copy numbers of the sap genes range from six to 10. Additional genes found in 365 this region for some isolates encode a mRNA interferase toxin (yafQ also known as relA), a hip 366 A domain protein (hipA), an AAA family ATPase, additional surface associated products (sap) 367 and some of unknown function. Indeed, the high proportion of genes that encode proteins 368 involved in functions at the bacterial surface indicate the important pathogenic nature of this 369 hypervariable region for all CFs.
370 Selected gene complement 371 Many of the genes of the hypervariable region, together with several additional genes presumed 372 to be important with respect to virulence or phenotype, are detailed in Supplemental Table S3. In 373 most cases these genes have been identified in all members of the Campylobacter genus (Ali et 374 al. 2012) and are presumably necessary to support host cell adherence, invasion and immune 375 evasion. As such in general they were retained in all 14 CFs though with significant variation in 376 location for some isolates. This included three copies of the cytolethal distending toxin operon 377 comprised of three genes (cdtA, cdtB and cdtC), with truncation of one or more alleles in some 378 cases. In some isolates specific genes were lacking, including loci encoding a tyrosine 379 recombinase (xerH), a filamentous hemagglutinin transporter protein (fhaC) or the twitching 380 mobility protein (pilT) but these variations did not respect the sub-species / biovar designations. 381 However, other coding differences did respect some of the observed phenotypic variation. All 382 five CFV isolates lacked two genes (tcyA and tcyC) comprising the L-cys ABC transporter 383 operon while the 3'-terminal sequence of the remaining gene (tcyB) was modified; all CFF and 384 CFVi isolates retained the complete unaltered operon. Three genes previously described as 385 significant with respect to CF variation in lipopolysaccharide biosynthesis (Kienesberger et al. 386 2014) were also examined. The glf locus that encodes a UDP-galactopyranose mutase was found 387 in all serotype A CFFs but not in serotype B CFFs or any CFV/CFVi isolates. In contrast the 388 mat1 gene that encodes a maltose O-acetyltransferase activity was present in all isolates except 389 for the serotype A CFFs. A wcbK gene that encodes the enzyme GDP-mannose 4,6 dehydratase 390 was found only in serotype B CFFs. Finally, it was noted that the transporter gene (annotated as 391 kefC in this study) corresponding to the nahE target used previously for C. fetus detection was 392 present in all isolates. 393 Mobile genetic elements 394 The presence of two groups of mobile genetic elements, transposons and prophages, throughout 395 these 14 CF genomes was highly variable (Table 3). Various copy numbers of two distinct 396 versions of the IS200 family of insertion sequences, IS605 and IS607, were scattered throughout 397 the genomes of all CFV and CFVi isolates and were frequently present in the hypervariable 398 region. Indeed, while most of these isolates contained multiple copies of both types of 399 transposon, CFV-84-112 contained two IS605 copies only and CFVi-ADRI1362 contained five 400 copies of IS607 but no IS605. These elements were not present in any of the CFF genomes. 401 All CFs contained a prophage sequence ranging in length between 30 to 35Kb but whereas the 402 CFFs contained just one copy of this element all of the CFV and CFVi genomes, with the 403 exception of CFV-NCTC10354 and CFVi-03-293, contained multiple copies; CFVi-ADRI1362 404 had no less than nine copies. This element was typically bound at either end by a copy of the 405 lexA gene that encodes a prophage regulator, except for the prophage located in the 406 hypervariable region of CFVi-ADRI1362 in which these loci were absent. The prophage 407 sequence typically included genes encoding putative glutamine ABC transporter permease 408 proteins, a mobile element, a modification methylase DpnIIA (dpnM), capsid and tape measure 409 domain-containing proteins, an ATPase, phage and portal proteins, the GP27 locus and many 410 hypothetical products. 411 CRISPR-cas complement 412 The highly variable number of mobile elements present in these genomes led to a review of the 413 presence of genes known to be involved in preventing invasion of the cell by foreign nucleic 414 acids. This included a review of the components of the CRISPR-cas system. All 14 isolates 415 contained at least one CRISPR locus and several of the CFFs contained two or more such loci 416 (Table 4). Each locus contained 30 base direct repeats (DRs) of sequence 417 GTTTGCTAATGACAATGTTTGTGTTGAAAC with occasional minor modification. Three 418 isolates contained just one short locus of two (CFF-02A725), three (CFVi-03-293) and five 419 (CFF-04-554) spacers respectively while the other isolates retained loci comprising 20-26 420 spacers. There was also significant variation in the presence of CRISPR-associated proteins. The 421 four CFFs having two or more CRISPR loci encoded a complete set of cas genes (cas1 to cas6) 422 while the other two CFFs retained a modified cas6 gene only. All seven CFV/CFVi isolates also 423 contained this modified cas6 gene but lacked cas1-5 genes. A cas9 pseudogene was identified in 424 seven isolates representative of all three biotypes while this locus was not observed in the 425 remaining isolates. With a few exceptions most other CRISPR-associated genes were retained 426 though some had truncated ORFs. 427 Restriction-modification system genes 428 Review of the presence of restriction-modification (R-M) system genes, also capable of limiting 429 invasion of the cell by foreign genetic material, indicated that most isolates retained a common 430 but limited set of such genes though the serotype B CFFs were somewhat distinct in this regard 431 due to a lack of several Type 1 R-M loci (Supplemental Table S3). However, review of the 432 complement of so-called orphan methyltransferases that are unassociated with restriction 433 endonucleases revealed some interesting differences. While all isolates retained one copy of an 434 adenine-specific methyltransferase (fokI) all but one (CFF-04-554) contained one or more copies 435 of a DNA adenine methylase (dam) gene. This gene appeared to be present as three distinct 436 alleles which differ at their 5' termini and encode proteins ranging in length from 163 residues 437 (allele 1) through 213 residues (allele 2) to 253 amino acids (allele 3). Alleles 1 and 2 were 438 found only in CFF strains while between one to four copies of allele 3 were found in CFV and 439 CFVi strains.  (Table 5) and the structure of these genomic islands is 445 illustrated (Fig 6). A genomic island (T4SS GI 1) of just under 40Kb containing several Type IV 446 SS genes, including virB2 to virB11 and virD4, as well as genes believed to provide ancillary 447 functions (Fic proteins and a lytic transglycosylase) and genes frequently plasmid-associated, 448 such as dnaG, topB and a tetracycline resistance ribosomal protection protein Tet(44), were 449 identified in CFF-09A980. This sequence, corresponding to the insertion identified between 450 1,200,000 and 1,250.000 in this isolate, was not present in any other CFF. A GI similar in 451 composition and organisation was also found in CFV-84-112 and CFV-97-608 while CFVi03-491 source of infection. However, their plasmids did exhibit some coding differences. CFV-492 08A948P1 contained two copies of a rha family transcriptional regulator and a trbE gene not 493 found in CFV-08A1102P1 as well as one additional IS605 transposon. Products encoded by 494 CFV-08A1102P1 but not CFV-08A948P1 included a BRO phage repressor protein, a single 495 stranded binding protein and an adenosine monophosphate protein transferase (vbhT). In addition 496 to these differences some gene products, including those of trbF, trbL and trbJ, exhibited 497 significant sequence differences. ). This report, which is the first to compare the organisation and 520 gene complement of a significant number of complete closed C. fetus genomes, provides 521 additional insights into the role that different elements may have played in the evolution of these 522 genomes, particularly with respect to the emergence of CFV and CFVi. . Clearly the spread of these elements throughout the genome raises the possibility of 528 gene disruption and function loss though it is notable that many of these elements were not 529 located at points of genome rearrangement and thus may not be a primary driver of large-scale 530 genome plasticity. 531 In contrast all CF isolates examined in this report contained prophage sequence. While the CFFs 532 contained just one copy of this element, CFV/CFVi strains tended to harbour greater copy 533 numbers of these elements and very notably these prophage sequences were often located at the 534 boundaries of sequence rearrangements and inversions supporting the importance of this element 535 type to genome plasticity. The potential role of prophage mobility in altering the gene 536 complement is clearly illustrated by the sequence of CFF-02A725 in which prophage 537 translocation (residues 1,127,063-1,160,613) resulted in the loss of several components of the 538 CRISPR-cas system. The presence of prophage sequence in the hypervariable region of CFV and 539 CFVi isolates may well have contributed to its evolution though its absence in that region of the 540 CFFs suggests that other factors, including possibly homologous recombination involving the 541 sap gene loci, may have contributed to the plasticity of this region (Tu et al. 2003). Indeed, the 542 identification of rare recombinants having an AB serotype supports the high plasticity of this 543 genomic region (Dingle et al. 2010). 544 Bacteria have a number of mechanisms to limit the extent to which foreign genetic material can 545 invade a cell and become incorporated into the chromosome. Restriction-modification systems of 546 many bacteria facilitate degradation of invading sequences having specific sequence motifs; 547 however, C. fetus appears to have limited capability in this regard with no clear distinction 548 detected between CFF and CFV subspecies. Another important process involves the CRISPR-cas 549 system that is believed to confer an adaptive immune response to protect against invasion by 550 mobile genetic elements (van der Oost et al. 2014). The CRISPR locus consists of a series of 551 repeat sequences interspersed with spacer sequences derived from invading genetic material. Cas 552 proteins are responsible for acquisition of these spacers as well as processing of RNA transcripts 553 of these loci in a process that ultimately results in the degradation of DNA homologous to the 554 spacer sequence. Three distinct CRISPR-cas systems with different mechanisms of action have 555 been identified but in each case the cas1 and cas2 genes are critical to the initial steps of the 556 process that incorporate the spacer sequences into the CRISPR locus and that later serve as 557 templates for subsequent recognition of matched sequence elements. Cas6 and cas9 genes act as 558 nucleases which process CRISPR transcripts in preparation for assembly into ribonucleoprotein 559 particles used for surveillance and ultimately degradation of foreign DNA complementary to a 560 spacer sequence (van der Oost et al. 2014). Variation in the CRISPR-cas system complement of 561 C. fetus had been reported previously (Gilbert et al. 2016; van der Graaf-van Bloois et al. 2014) 562 and this was further examined in this study. Four CFFs had two or more CRISPR loci and a 563 virtually complete complement of cas and CRISPR-associated genes representative of a 564 functional CRISPR-cas system. Two CFFs, CFF-04-554 and CFF-02A725, as well as all CFV 565 and CFVi isolates lacked cas1-cas6 genes but retained a modified cas6, cas10, a member of the 566 Type III CRISPR-cas system, and some other CRISPR-related genes. While these isolates may 567 retain the ability to process spacer transcripts and thereby prevent invasion of elements 568 containing these sequences, the loss of cas1 and cas2 would preclude addition of sequences to 569 the CRISPR locus and thereby limit the scope of the system. In those isolates in which the spacer 570 number is significantly reduced the value of this system is clearly highly compromised. 571 Limitations in the functioning of the CRISPR-cas system could provide various prophage and 572 transposable elements the opportunity to successfully invade the cell and integrate into the 573 chromosome. While this in itself would not explain the phenotypic changes that have 574 accompanied the emergence of the CFV/CFVi strains, the possibility of random mobile element 575 insertion into the bacterial chromosome and spread within the genome could result in loss of 576 function due to gene deletion or changes in gene expression as well as increased genome 577 instability. Consistent with our observations, it has been reported that the screening of a 578 collection of 102 C. fetus isolates for the presence of cas1 failed to detect this gene in all 62 CFV 579 samples (Kienesberger et al. 2014) and it was later suggested that disruption to the CRISPR-cas 580 system could be a significant factor contributing to the emergence of CFV/CFVi strains (Calleros 581 et al. 2017). We speculate that similar loss of CRISPR-cas functionality in CFF isolates, as found 582 here, might initiate a series of genomic modifications to alter their pathogenicity, thus explaining 583 their potential ability to produce infertility in cattle as discussed above. Clearly more work would 584 be needed to substantiate this possibility, but if true, may further blur the distinction between 585 these subspecies in pathogenicity and the role of the two subspecies in BGC. 586 One notable distinction found between CFF and CFV/CFVi isolates concerns differences with 587 respect to their predicted dam gene complement. This gene, first identified and extensively 588 characterised in Escherichia coli, encodes DNA adenine methylase responsible for post-589 replicative adenine methylation at GATC sites along the genome. DNA methylation is an 590 important epigenetic process that can impact aspects of DNA replication, including mismatch 591 repair, as well as gene expression (Adhikari & Curtis 2016;Casadesús & Low 2006). Distinct 592 alleles of a gene predicted to have this activity were found in CFF and CFV/CFVi strains; 593 furthermore, multiple copies of allele 3 of this gene were present in six of the eight CFV/CFVi 594 isolates examined. The structural differences between these dam products and potential variation 595 in their expression levels due to variable gene copy number could have significant impact on the 596 extent of GATC methylation of these genomes and, in turn, effect differences in expression of 597 other genes. This aspect of C. fetus biology is worthy of further investigation considering that 598 dam methylation is believed to impact host-pathogen interactions through modification of 599 virulence gene expression of several bacteria (Marinus & Casadesus 2009). 600 This report has also explored the plasmid composition of these 14 CF strains as these 601 extrachromosomal elements can contribute genes that impart additional phenotypic features to 602 the organism. Greater numbers of these elements were identified in CFV and particularly CFVi 603 strains, again supporting the concept that these strains are deficient in their ability to limit 604 cellular invasion by foreign DNA. Apart from genes important for replication of these elements,