Beware of False “Type Strain” Genome Sequences

Ed. Note: The authors of the published articles did not respond.

With this letter, we warn users of bacterial DNA sequence data about recent cases misusing the term “type strain” in bacterial genome sequence reports and highlight the importance that the term is used in the correct context.

In

W ith this letter, we warn users of bacterial DNA sequence data about recent cases misusing the term "type strain" in bacterial genome sequence reports and highlight the importance that the term is used in the correct context.
In recent articles published in the journal Genome Announcements (GA), (now Microbiology Resource Announcements [MRA]) (1)(2)(3)(4)(5), the complete genome sequences of five strains of five species of bacteria were reported. The titles of the articles stated that the strains represent the respective "type strains" of the five species, although the articles do not present details about the processes of defining the type strains. At this point, it is important to point out that the concept and description of "type strain" are not arbitrary; the type strains of bacterial species are defined by rule 18a of the International Code of Nomenclature of Prokaryotes as follows: "The type strain is made up of living cultures of an organism, which are descended from a strain designated as the nomenclatural type" (6). Strains serving as nomenclatural type material are designated precisely, and this may be done in various ways (7); typically, a "holotype strain" is designated at the time of valid publication of a new species name (rule 18b), in which the type strain must conform to defined conditions (rule 30). The five species described in the GA publications had been validly published previously, and the type strains for those species were already defined, preserved, and publicly available in numerous culture collections. Meanwhile, the strains reported in the five GA articles were recently isolated independently. Therefore, since they are not descended from the already defined type strains, they cannot represent the authentic type strains of the species.
To confirm these observations, we performed average nucleotide identity (ANI) analyses using JSpeciesWS (8,9) between the five genome sequences reported in the GA articles and the genome sequences of the documented type strains of the five species (Table 1). The ANI values range from 83.10 to 98.96% (percentages aligned, 75.56 to 91.18%), confirming that the described strains are different and, furthermore, confirming that one of them, Lelliottia nimipressuralis SGAir0187, is misclassified and is not a strain of this species.
Representing genome sequences from false type strains of given species has the potential to lead to erroneous conclusions in future studies that may rely on the published misinformation, as they may be used as the wrong reference points. We encourage the authors of the five GA publications and the journal to publish corrigenda and remove the words "type strain" from the titles of the publications. We also warn users of publicly available genome sequence data to be cautious in accepting the metadata associated with genome sequences; recent studies have clearly demonstrated the presence of high numbers of misclassified genome se-quences in different taxa according to genome sequence identity criteria for specieslevel identifications (10)(11)(12)(13). Public databases archiving genome sequence data are implementing new controls for catching sequence deposits that do not adhere to accepted taxonomic standards for identification. However, the public databases may be able to control descriptions of only the sequence data submitted; they cannot control the descriptions of the data that authors may use in publications. We encourage users of reported and archived genome sequence data to perform relevant taxonomic controls, for instance, by comparing housekeeping gene sequences from genomes with independently determined sequences of documented type strains; if possible, comparisons should be made from the data of the original descriptions of the species.
In conclusion, type strains serve as important "reference points" in bacterial taxonomy and systematics and are fundamental and essential resources for microbiology. The presence of sequences erroneously reported as type strains is a real example of "fake news" and is potentially dangerous, as it can easily lead to further errors in studies relying on them. With the presented examples of inaccurate and inappropriate use of the term "type strain," we emphasize how important it is to confirm the designation. In fact, we would propose that the public databases establish dedicated repositories for the genome sequences of the type strains of validly published species. Essentially, while genome sequence data represent enormous resources, it is imperative to perform basic and relevant controls when reporting and using public genome sequences.