Markers for discriminating Campylobacter concisus genomospecies using MALDI-TOF analysis

Highlights • Efficacy of commercial MALDI-TOF system for identifying C. concisus genomospecies evaluated.• Use of cluster analysis of MALDI-TOF profiles to discriminate C. concisus genomospecies explored.• MALDI-TOF-based markers helpful for identification of C. concisus genomospecies identified.

These apparent conflicts as to the pathogenic potential of the organism has been proposed by some to be explained by its extensive genetic diversity, which has been demonstrated in various studies ( Aabenhus et al., 2005 ;Kirk et al., 2018 ;Liu et al., 2020 ;Matsheka et al., 2002 ). Critically, strains identified as C. concisus may in fact belong to closely related, yet genetically distinct taxa referred to as "genomospecies ", of which two appear predominant ( Aabenhus et al., 2005 ;Liu et al., 2020 ;Mahendran et al., 2015 ;Vandamme et al., 1989 ). Multi-Locus Sequence Typing (MLST) generally supports a phylogenetic distinction between these two groups, although the separation is not perfect ( Miller et al., 2012 ). The C. concisus g enomospecies could be proposed as nomenclaturally distinct species if a clearly defined, readily determined phenotypic marker were identified, however at this time no such trait has been found. This hinders the accurate attribution of the major C. concisus genomospecies in healthy and diseased individuals, and in domestic pets, to improve our understanding of the public health impact of these bacteria. Given the frequency that C. concisus has been found in human faeces ( Cornelius et al., 2012 ;Lastovica, 2009 ;Nielsen et al., 2013 ), this is an important question to resolve.
Microbial identification in routine laboratories has been transformed in many countries with the implementation of commercial platforms to undertake Matrix-Associated Laser Desorption/Ionisation -Time-Of-Flight Mass Spectroscopic (MALDI-TOF MS) analysis ( van Belkum et al., 2015 ). This method involves the breakdown of cells by laser energy into molecules that are separated and detected on the basis of their differing mass and electrical charge. The resulting spectrum can then be compared to databases containing similar spectra derived from those of known organisms and an identification attained when a threshold similarity level is reached. This process is automated in commercial systems, but is reliant upon the relevant taxon being present in the database. This paper examines the performance of a commercial MALDI-TOF MS identification system for strains assigned to each of the two major C. concisus genomospecies, and explores the potential for enhanced discrimination using this method.

Strains examined
Strains examined and their sources are listed in Table 1   Genomospecies designations and sources of Campylobacter concisus strains examined, and summary of their identification results with the commercial (Bruker) database. Genbank accession numbers for genome sequences and Multi-Locus Sequence (MLS) Types ( Miller et al., 2012 ) where available are also listed. All newly determined MLS types were unique.     Vandamme et al., 1989 ).Where whole genome sequences and MLS types are available, these details are also given. According to AFLP profiling, all strains were distinct .

MALDI-TOF MS analysis and routine identification of strains
Strains were cultured for 3 days under microaerobic conditions (80% N 2 , 10% CO 2 , 3% O 2 , 7% H 2 ) in a dedicated workstation (Don Whitley Scientific, Bingley, UK) on 5% blood agar. MALDI-TOF MS profiling was performed using a Flex Biotyper instrument (Bruker Diagnostics, Karlsruhe, Germany). Discrete samples of bacterial growth were smeared onto the steel analysis plate and 1 μl of 70% formic acid added before addition of the matrix solution; samples were air dried before analysis, as described previously ( Werno et al., 2012 ). For four strains, 24 samples of bacterial growth were examined (the number recommended for determining reference samples), and for the remaining 15 strains, duplicate samples were examined. The resulting spectra were then compared to existing data in a proprietary database (v.6903; Bruker Diagnostics) using manufacturers recommended guidelines, as described previously ( Ge et al., 2017 ). The proprietary software evaluates the degree of resemblance of profiles, and outlines the best identification to its database entries ( n = 6903 as of this study), as follows. Highly probable species identification (2.3-3.0); secure genus identification, probable species identifications (2.0-2.299); probable genus identification (1.7-1.999); or not reliable identification (0-1.699). Samples were compared with the Bruker database on 7th October 2020.

Cluster analysis and biomarker identification
Strain MALDI-TOF MS spectra were exported in a textfile format, and assimilated into the software BioNumerics 7.6 (Applied Maths, Kortrijk, Belgium) for analysis. Cluster analysis was performed using the Peak based-Dice coefficient using the parameters of minimum height 0%, peak matching of constant tolerance 1, linear tolerance 500 ppm, shift factor 1, and UPGMA (unweighted-pair group method with arithmetic mean) algorithm.
The potential biomarkers were identified using the matrix mining tool according to the BioNumerics Tutorial "Peak matching and follow up analysis of spectra ". The peak matching was performed using default settings (constant tolerance 1.9, linear tolerance 550 ppm, peak detection rate 10). All peak classes with a p -value < 0.05 were initially selected and further considered as the potential biomarker, combined with visual observation.

Identification of C. concisus strains using the proprietary bruker database
The range of identification scores for each of the MALDI-TOF sample replicates for the strains examined is summarised in Table 1 . Only the type strain of C. concisus was consistently confidently (24/24 replicate samples with scores > 2.3; 40.6% of all genomospecies 1 samples tested) identified to this species, and this was the only strain we tested that is present in the Bruker database. Of the remaining C. concisus genomospecies 1 strains, 45.7% of samples were considered probable species identifications and 13.5% probable genus identifications, in each case with C. concisus named as the most likely species. Genomospecies 2 strains yielded less definitive results, with 55.1% samples considered probable to species level, and 39.6% considered probable to genus level, again with C. concisus named as the most likely species. However, 5.1% of samples were not identified to any defined species ( Table 1 ).

Cluster analysis of MALDI-TOF spectra of C. concisus genomospecies
Two major clusters were formed at the 44% similarity level ( Fig. 1 ). The first contained each of the eight Genomospecies 1 strains, plus one GS2 strain (L104.93). The second comprised GS2 strains only. The relatively low level of similarity exhibited among the strains is indicative of substantive diversity among the spectra. Table 2 lists marker peaks identified by the BioNumerics software, and confirmed by careful scrutiny of these data, that provided discrimination between the two genomospecies. Minor differences in the m/z values displayed were not considered by the software to be sufficient to distinguish them and also could not be differentiated when displayed in the cluster analysis (data not shown). Fourteen markers were considered useful for differentiation of the subspecies, although only one (m/z 7634.66) provided clear discrimination, being found only among GS1 strains.

Discussion
The genotypic variability of the C. concisus genomospecies has been well documented ( Aabenhus et al., 2005 ;Kirk et al., 2018 ;Liu et al., 2020 ;Matsheka et al., 2002 ) and likely underpins the substantive phenotypic diversity observed in this study, and indeed in others where whole-cell protein profiling has been used ( Vandamme et al., 1989 ). The variation demonstrated illustrates the difficulties that have been encountered for many years in identifying a single discriminating marker for these two taxa; the absence of a simple such differential feature is the fundamental reason that these taxa have not been formally described as distinct species, as clearly indicated by their whole-genomic relatedness ( Vandamme et al., 1989 ). In this respect, it is encouraging to note that we have determined the presence of a single marker that is present only among GS1 isolates. Further investigation and characterization of this trait may provide an important key towards the development of a simple test that can then be applied for easy genomospecies discrimination and subsequent description of novel species in accordance with minimal taxonomic standards .
As of 7th October 2020, the Bruker identification database contained 14 strains of C. concisus, including the type strain examined in our study. The genomospecies designations of the other strains is not known and clearly do not represent the full diversity of C. concisus phenotypes, since 18 of our 19 isolates did not achieve convincing identification scores. Such performance can easily be improved by incorporating our data into the proprietary database and the manufacturer has a standard protocol by which this can be achieved.
The role of C. concisus genomospecies in gastrointestinal disease and potential zoonotic infection has long been difficult to resolve given the problems in routinely identifying them. It is hoped this study provides some insights into the use of an increasingly available tool, MALDI-TOF MS, for achieving this aim.