Validation of MALDI-TOF MS Biotyper database optimized for anaerobic bacteria: The ENRIA project

Within the ENRIA project, several ‘ expertise laboratories ’ collaborated in order to optimize the identi- ﬁ cation of clinical anaerobic isolates by using a widely available platform, the Biotyper Matrix Assisted Laser Desorption Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF MS). Main Spectral Pro ﬁ les (MSPs) of well characterized anaerobic strains were added to one of the latest updates of the Biotyper database db6903; (V6 database) for common use. MSPs of anaerobic strains nominated for addition to the Biotyper database are included in this validation. In this study, we validated the optimized database (db5989 [V5 database] þ ENRIA MSPs) using 6309 anaerobic isolates. Using the V5 database 71.1% of the isolates could be identi ﬁ ed with high con ﬁ dence, 16.9% with low con ﬁ dence and 12.0% could not be identi ﬁ ed. Including the MSPs added to the V6 database and all MSPs created within the ENRIA project, the amount of strains identi ﬁ ed with high con ﬁ dence increased to 74.8% and 79.2%, respectively. Strains that could not be identi ﬁ ed using MALDI-TOF MS decreased to 10.4% and 7.3%, respectively. The observed increase in high con ﬁ dence identi ﬁ cations differed per genus. For Bilophila wadsworthia , Prevotella spp., gram-positive anaerobic cocci and other less commonly encountered species more strains were identi ﬁ ed with higher con ﬁ dence. A subset of the non-identi ﬁ ed strains (42.1%) were identi ﬁ ed using 16S rDNA gene sequencing. The obtained identities demonstrated that strains could not be identi ﬁ ed either due to the generation of spectra of insuf ﬁ cient quality or due to the fact that no MSP of the encountered species was present in the database. Undoubtedly, the ENRIA project has successfully increased the number of anaerobic isolates that can be identi ﬁ ed with high con ﬁ dence. We therefore recommend further expansion of the database to include less frequently iso- lated species as this would also allow us to gain valuable insight into the clinical relevance of these less common anaerobic bacteria.


The introduction of Matrix Assisted Laser Desorption Ionization
Time-of-Flight Mass Spectrometry (MALDI-TOF MS) has had a great impact on the identification of anaerobic bacteria isolated from human clinical specimens, providing more timely and accurate identification compared with phenotypic methods [3,4,16]. Several studies have been performed on the performance of MALDI-TOF MS for the identification of anaerobic bacteria. The general conclusion is that the databases of the MALDI-TOF MS systems need optimization for the identification of anaerobic bacteria [4,9,17]. To accomplish this we formed the European Network for the Rapid Identification of Anaerobes (ENRIA). Within this project, seven European laboratories, with expertise in anaerobic bacteriology, collaborated in order to collect clinical isolates for addition to the MALDI-TOF MS database. As a quality requirement, the ENRIA group set itself the goal of obtaining at least five main spectral profiles (MSPs) for each species [16].
The ENRIA group felt that it was vital for the project to benefit the microbiology community worldwide and therefore we setup a collaboration with Bruker Daltonics, Bremen, Germany. At the time of going to press the first batch of MSPs of ENRIA strains have been added to the db6903 database (V6 database) of the Bruker MALDI-TOF MS Biotyper system. In this study we present the validation of this optimized database for the identification of anaerobic bacteria. Genera/species that require further optimization are also highlighted.

Bacterial strains
Over a period of six months all anaerobic bacteria isolated from human clinical specimens were identified using the db5989 database (V5 database) and the V5 database plus ENRIA MSPs, created and supplied by Bruker Daltonics. The latter consisted of two parts: the confirmed ENRIA MSPs (the ones included in the last update of the Biotyper V6 database) and all ENRIA MSPs (the confirmed ENRIA MSPs and the MSPs nominated for adding to the Biotyper database). An overview of these MSPs is shown in the supplemental data,

Identification
The MALDI-TOF MS Biotyper (Bruker Daltonics, Bremen, Germany) was utilized at each laboratory to perform the measurements as described previously [18]. Briefly, bacterial cells in the log phase of their growth were spotted on a stainless target twice using a toothpick. One spot was overlaid with 1 ml HCCA matrix (a-cyano-4-hydroxycinnamic acid in 50% acetonitrile/2.5% trifluoro-acetic acid) and left to dry at ambient temperature. An on target extraction was performed on the other spot by first overlaying the spot with 1 ml 70% formic acid. Immediately after drying at ambient temperature, the spot was overlaid with 1 ml HCCA matrix. Each laboratory performed the measurements as part of their daily routine, using the standard settings. Obtained log scores were interpreted as advised by the manufacturer. A log score of 2 was considered an identification with high confidence, a log score of 1.7 and < 2 as an identification with low confidence and a log score of <1.7 as no reliable identification.

Data interpretation
The obtained identifications were divided into 3 groups. No reliable identification (or no identification), identification with low confidence and an identification with a high confidence. The latter two were interpreted as a reliable genus identification and a reliable species identification, respectively. For several species an identification to the subspecies level was given by the Biotyper, for example for Fusobacterium nucleatum and Actinomyces neuii. However, for the data interpretation only the identification to the species level was considered. Furthermore, we are aware of the fact that several species of Veillonella are difficult to separate from each Table 1 The distribution of the different genera isolated by all expertise laboratories, used for the validation of the optimized database.

Genus
No. of strains % of total other by MALDI-TOF MS and 16S rDNA gene sequencing, namely, Veillonella dispar, Veillonella parvula, Veillonella denticariosi and Veillonella rogosae. Strains identified as any of these four species are named as Veillonella spp., regardless of the log score obtained during the measurement. Also some species of Bacteroides cannot be differentiated from each other. Therefore strains identified as Bacteroides ovatus or Bacteroides xylanisolvens, Bacteroides thetaiotaomicronor or Bacteroides faecis and Bacteroides vulgatus or Bacteroides dorei were identified as either one of the two. The MALDI-TOF MS also has difficulties with differentiation of Fusobacterium naviforme from Fusobacterium nucleatum. These two strains were identified as F. nucleatum/naviforme. Furthermore, we noticed that it is also difficult to differentiate Porphyromonas asaccharolytica from Porphyromonas uenonis. Strains identified as either one of these species were designated as P. asaccharolytica/uenonis. In order to maximize the potential opportunity of gaining insight into the clinical relevance of rare anaerobic species, we did not differentiate between valid species and non-valid species.

16S rDNA gene sequencing
A subset of the strains that could not be identified using MALDI-TOF MS, were identified using 16S rDNA gene sequencing (193 of 458 (42.1%)). This was performed at the originating laboratory, using their own primers and methods [7,8,20].

Distribution of bacterial strains
The optimized MALDI-TOF MS database was validated using 6309 anaerobic strains isolated from human clinical specimens. The distribution of the genera is shown in Table 1. The most prominent genus is Bacteroides, which represented 14.8% of the total number of strains analyzed. The genera Cutibacterium (formerly Propionibacterium [15]) and Prevotella, represented 10.3% and 9.2% of the total isolates respectively. Other notable genera which were included in sizeable numbers include those that belong to the gram-positive anaerobic cocci (GPAC) (Finegoldia, Peptoniphilus, Parvimonas, Anaerococcus, Peptostreptococcus, Murdochiella and Peptococcus) accounting for 21.9% as a group of all the analyzed anaerobes.

Validation
All results are per genus presented in Table 2. Detailed results per genus are presented in the "Data in Brief" [19]. Of the 6309 analyzed strains, 4485 (71.1%) strains were identified with high confidence, 1064 (16.9%) with low confidence and 760 (12.0%) could not be identified, using the V5 database. Adding the confirmed ENRIA MSPs increased the amount of strains identified with high confidence to 4718 (74.8%) and decreased the number of strains identified with low confidence and with no identification to 937 (14.9%) and 654 (10.4%), respectively. For 19.1% of the strains a higher log score was obtained. Adding all ENRIA MSPs increased the amount of strains identified with high confidence even further, to 4999 (79.2%), whilst substantially decreasing those that identified with low confidence to 852 (13.5%). The number of strains with no identification decreased to 458 (7.3%). Therefore, a higher confidence of identification was obtained for 35.2% of the strains upon adding all ENRIA MSPs.
The observed increase in genus/species identification differed by genus. Bilophila wadsworthia was only represented in the V5 database at the genus level as 'Bilophila spp'. The addition of MSPs of clinical isolates resulted in an increase of the number of strains with a log score 2 from 2 strains to 17 strains, respectively. The addition of MSPs of two species of Lachnoanaerobaculum, namely Lachnoanaerobaculum orale and Lachnoanaerobaculum umeaenense, ultimately resulted in a high confidence identification of all tested strains, whereas previously only 5 of the 9 strains (55.6%) could be identified. Using the V5 database, 79.2% of the Prevotella strains were identified with high confidence. This increased to 85.7% and 91.0% by adding the confirmed ENRIA MSPs and all ENRIA MSPs, respectively, partly due to the addition of MSPs of species that were not represented in the database. 69.2% of the strains belonging to the GPAC genera were identified with high confidence using the V5 database. Adding confirmed ENRIA MSPs and all ENRIA MSPs resulted in a high confidence identification for 76.5% and 86.4% of the strains, respectively. This increase was due in particular to the addition of MSPs of species (valid and non-valid) not yet represented in the database, such as Peptoniphilus duerdenii, Peptoniphilus tyrrelliae, 'Peptoniphilus rhinitidis' and Peptostreptococcus stomatis. Increasing the number of MSPs of Anaerococcus vaginalis gave a significant increase for strains identified with high confidence, from 12.1% to 85.0% using the V5 database and all ENRIA MSPs, respectively. With the latter database, a higher log score was obtained for 100% of the A. vaginalis strains. The amount of Porphyromonas strains identified with high confidence also increased mostly due to the addition of P. asaccharolytica/uenonis ENRIA MSPs (from 53.6% using the V5 database to 80.0% by including all ENRIA MSPs).

16S rDNA gene sequencing
Of the 6309 analyzed strains, 458 were not identified using MALDI-TOF MS. To assess the reason for this, 193 of these strains were sequenced by the originating laboratories. Species for which no identification was obtained for more than one strain are shown in Table 3. A portion of these strains gave no spectrum during MALDI-TOF MS analyses and these were mostly species belonging to the genus Actinomyces and Propionibacterium/Cutibacterium (data not shown). For other species no identification was obtained due to the fact that they are not represented in the MALDI-TOF MS database; e.g. Eisenbergiella tayi, Anaerovorax odorimutans and Akkermansia muciniphila. Other species are represented in the database, but not identified by MALDI-TOF MS; e.g. Dialister pneumosintes, Eggerthella lenta, Atopobium minutum and Anaerococcus prevotii. Species of which only one strain was encountered and not represented in the database included; Anaerocolumna aminovalerica, 'Casaltella massiliensis', Clostridium tetanomorphum, and Haloimpatiens lingqiaonensis.

Discussion
In this study, we validated the Bruker MALDI-TOF MS database, optimized for anaerobic bacteria by utilizing a large set of anaerobic strains isolated from human clinical specimens. Several European laboratories specializing in anaerobic bacteria collaborated as part of the validation, allowing us to include strains of species less commonly encountered in human infections from within the collaborators private collections. Identifications with high confidence increased by 3.7%, from 71.1% to 74.8%, using the V5 database and confirmed ENRIA MSPs. Using V5 and all ENRIA MSPs, identifications with high confidence increased by 8.1%, from 71.1% to 79.2%.
The three most prevalent genera were Bacteroides (13.5%), Cutibacterium (9.4%) and Prevotella (8.7%). For the first two genera,  [20] demonstrated that optimizing the Biotyper database for the identification of Prevotella species resulted in a high confidence identification of clinical isolates of Prevotella of 89.2%. More recently Gürsoy et al. [6] validated the identification of species of oral origin and found that all strains were correctly identified at the low confidence level (log scores 1.7e2) and to a high confidence species level for 88.6%. Similar rates of high confidence identification for Prevotella species were observed in this study. After optimization with confirmed ENRIA MSPs the level of high confidence identifications increased from 79.2% to 85.7% and finally to 91.0% when all ENRIA MSPs were added to the V5 database. Veloo et al. [17] demonstrated that the number of GPAC strains identified with high confidence increased from 53.6% to 82.1% when the Biotyper database was optimized for GPAC species Table 3 An overview of species, of which more than one strain was encountered, which could not be identified using MALDI-TOF MS. The identity of the strain was determined using 16S rDNA gene sequencing.    [14]. During our study, we observed difficulties differentiating P. asaccharolytica from P. uenonis, using both MALDI-TOF MS and 16S rDNA gene sequencing. This can be explained by the fact that the 16S rDNA sequence difference between these two species is less than 2% [5]. Differentiation of these two species using MALDI-TOF MS requires further investigation that is beyond the scope of this publication. Certain species remain difficult to identify using MALDI-TOF MS. Of the strains in this category from within our study we observed that 18% of these belonged to the genus Actinomyces and that these often did not give an identification as no spectrum was obtained. This could be explained by the dry colony morphology of certain species, which hampers the spotting [18], or due to the thick cell wall of this bacterium. In both cases, it is advised to perform a full extraction to overcome these problems. A significant number of Veillonella strains were included in the validation. From the obtained log scores for the species V. dispar, V. parvula, V. dentacariosi and V. rogosae we observed that the MALDI-TOF MS has difficulty differentiating between these species. Marchandin et al. [10] described the phenomenon of micro-heterogeneity, which indicates scattered nucleotide differences, between the different copies of the 16S rDNA gene present in one strain of Veillonella. This makes it difficult to differentiate V. dispar and V. parvula from each other using solely 16S rDNA gene sequencing. It is advised that for the accurate speciation of strains from the genus Veillonella that three different housekeeping genes: dnaK, rpoB and gltA are used [1]. When MALDI-TOF MS is used for the identification of bacteria, mostly the ribosomal proteins are measured [11]. Since several Veillonella species cannot be differentiated from each other using 16S rDNA gene sequencing, it is not surprising therefore that this is also the case when MALDI-TOF MS is used for the identification. Further research into the identification of Veillonella using MALDI-TOF MS is necessary.
Historically, the identification of anaerobic bacteria has been overlooked by diagnostic laboratories largely due to the difficulties associated with the phenotypic identification systems designed specifically for anaerobes. Also the perception exists that a specialized workflow is required, which would be more time consuming than for aerobic bacteria. Many new species have been proposed recently of which the phenotypic features are poorly described. Therefore, commercial identification systems where the identification tables are not kept up to date are unlikely to cover these species. In order to successfully monitor the clinical relevance of these often unvalidated species we chose to add these to the ENRIA database. An example is 'Fenollaria massiliensis', originally isolated from an osteoarticular sample [12]. This species is not validated and its clinical relevance is unknown. During our study we encountered seven strains of this species, which would previously have gone unrecognized in clinical laboratories. This example demonstrates that it is also important to have unvalidated species validated, especially if we wish to gain insight into the clinical relevance of such unusual new species. The importance of including less common species within the databases of MALDI-TOF MS systems is also emphasized by a study by Bernard et al. [2]. They encountered eight isolates of E. tayi, isolated from blood cultures, concluding that this species is a potential pathogen. Creating MSPs for these strains ensured that this species can be identified using MALDI-TOF MS.
The validation of the MALDI-TOF V5 database expanded using clinical isolates collected within the ENRIA project clearly demonstrated that an increase in the percentage of strains identified was achievable. It should be noted that not all species can be differentiated from each other, especially certain species of the genus Veillonella. We recommend the continual expansion of the MALDI-TOF MS database with MSPs of less common species, valid and unvalid, in order to gain insight into their clinical relevance.