Systematic and Applied Microbiology The importance of designating type material for uncultured taxa

often inconsistent with the International Code of Nomencla- ture of types is one most


Introduction
Culture-independent characterization of the microbial world has prompted calls for a unified nomenclature to describe uncultured organisms based primarily on sequence data [22,24,39]. The naming of cultured Bacteria and Archaea is governed by the International Code of Nomenclature of Prokaryotes (ICNP or the Code; [32]), which currently does not officially extend to uncultured microorganisms. Many uncultured prokaryotes have been named using the Candidatus status to indicate putative taxa [24], but the majority of these names are inconsistent with the ICNP. Apart from linguistic errors [26], common issues include: failure to designate the nomenclatural type, naming of higher taxa without naming the type genus and species, absence of a description of the taxon, absence of an etymology for the name. In light of a pending proposal to incorporate nomenclature of uncultured prokaryotes under the Code [39], correction of these issues is necessary for names to obtain standing in nomenclature and priority. There is some urgency to implement such nomenclature standards as the number of metagenomeassembled genomes (MAGs) is rapidly outgrowing the number of species based upon type cultures. As an example, we provide descriptions of nomenclature type material for four uncultured lineages based on high-quality MAGs that are consistent with the ICNP.
In the description of any novel taxon, the ICNP requires the designation of type material (Section 4 of the Code; [32]), which is key for providing priority to the name. The combination of priority and type material ensures the uniqueness and stability of names by permitting only the name with priority to be used to denote the taxon that includes the type material. Likewise, the name cannot be used to denote a taxon that does not contain the type material. This rule implies a simple and objective criterion for type material. The type must be sufficiently well described by the type material to distinguish it from other taxa of the same rank. Delineation of species typically includes genetic comparisons, either DNA-DNA hybridization, average nucleotide identities, average amino acid identities or similar criteria [11,22,38]. Delineation of genera and higher taxa is typically performed using phylo- genetic relationships inferred from 16S rRNA genes [15,41,42] or concatenated protein alignments [12,23,36]. The justification for a genomic sequence to serve as the type material is based upon its ability to distinguish closely related species and has been outlined elsewhere [21,38,39]. MAGs and SAGs have recently been proposed to be used for the description of uncultured taxa [22] with which we will likely be able to achieve even strainlevel resolution in the near future using new techniques such as genome reconstruction based on DNA methylation profiles [6] or improving MAG assemblies with long-read sequencing platforms [10].
In the Code, lower taxonomic ranks are the types for higher ranks ( Fig. 1; Section 4 of the Code). Thus, the type of a genus is a species, the type of a family and order is a genus, and the type of a class is an order. If the recommendation of Oren et al. [28] (addendum by Whitman et al. [40]) is followed, the type of a phylum will be a class. The logic of this system is simple. The types and thus the names themselves can all be traced back to a specific biological entity whose existence can be clearly demonstrated, i.e. a culture, a DNA sample or information such as genome sequence. Use of type material to allocate names to taxa is a fundamental process in biological nomenclature under all three codes (International Code of Nomenclature for algae, fungi, and plants; International Code of Zoological Nomenclature and International Code of Nomenclature of Prokaryotes). Even as taxonomic opinions change and new discoveries are made, the names of all taxa continue to refer to their types, thus, allowing nomenclature to be distinct from taxonomic classification. This system allows flexibility in classification while maintaining stability in nomenclature.
Largely driven by culture-independent sequence analysis, it has become common practice to designate new phyla or other higher taxa without designating lower ranks [1,3,8,14,17]. This practice is in direct conflict with the ICNP mandate that higher taxa names follow from lower taxa names that serve as their types (Rule 15 of the Code; Fig. 1), resulting in nomenclatural instability of such names. Moreover, there is an inherent ambiguity of higher taxa. For instance, while there is some consensus about what constitutes species and genera, there is less consistency in the designation of higher taxa [17], with estimates of the number of bacterial phyla ranging from ∼100 to ∼1350 [18,30,42]. Lastly, this practice asserts a meaning to a taxonomic rank, which contradicts one of the central tenets of the Code, i.e. independence of nomenclature from taxonomy (General Consideration 4). Direct consequences of ignoring the principle of hierarchy and typification would be the inability to appropriately name taxa if taxonomic opinion changes (i.e., transfer, union or change in rank) leading to unconstructive debates on name authority and 'priority'. In order to have a well-defined taxonomic framework, it is crucial to ensure that the naming hierarchy is followed and that type material is provided for each taxon starting from species.
Lastly, names only gain standing in nomenclature after they have been announced in the International Journal of Systematics and Evolutionary Microbiology (IJSEM), the official journal of the ICSP (Rules 24a and 27). Although Candidatus taxa currently lack standing, their names should be announced, either as a publication in the IJSEM or inclusion on the "List of Candidatus Names", which will soon be added to the lists periodically published in the IJSEM (see Appendix A1 of the Code). Authors and other individuals wishing to have new names and/or combinations included in the Lists should send an electronic copy of the published paper to the IJSEM Editorial Office. This procedure ensures that the name becomes widely known.

Selection of candidate phyla for type material designations
Candidate bacterial and archaeal phyla lacking designated type material were selected based on the classification of the Genome Taxonomy Database (GTDB), release 2-RS83, which only includes genomes available prior to July 2017 in National Center for Biotechnology Information (NCBI) RefSeq/GenBank release 83, passing GTDB genome quality filters (see Ref. [31]). Genome-based phylogenetic reconstructions on which the classifications are based are available from the GTDB website under the download option (gtdb. ecogenomic.org/downloads).

Selection of MAG-based type material
Once candidate phyla were selected, metadata for all representative genomes was obtained from a metadata file available from the GTDB download page. This included assembly statistics, genome quality (completeness and contamination estimates) and properties (e.g., GC content, estimated genome size, number of contigs). Number of tRNAs and the corresponding encoded amino acids were predicted using tRNA scan SE-1.23 using the domainspecific settings for Bacteria and Archaea. Presence and length of 16S, 23S and 5S rRNAs were verified using HMMER and domainspecific SSU/LSU HMM models as implemented in the 'ssu-finder' method of CheckM [29]. MAGs were considered as candidates for type material for a given lineage based on the following quality criteria: i) high-quality draft MIMAG standards for genome reporting (>90% completion; <5% contamination; multiple fragments where gaps span repetitive regions; presence of the 23S, 16S, and 5S rRNA genes and tRNAs encoding for at least 18 out of 20 standard amino acids; Bowers et al. [7]); ii) full or nearly full-length 16S rRNA gene with a threshold of 1200 bp for both Bacteria and Archaea [33]; iii) a draft assembly consisting of 100 contigs or less with less than 10 ambiguous bases. A quality score defined as completeness -4× contamination was used to rank the MAGs, and the one with the highest ranking score that meets the above selection criteria, was chosen to represent the given type. In the event of equal or similar quality scores for multiple MAGs, we chose the one with the most complete 16S rRNA gene sequence.

Results and discussion
Typifying the uncultured majority The International Code of Nomenclature of Prokaryotes (ICNP, [32]), Rule 15, states: "For each named taxon of the various taxonomic categories (. . .), there shall be designated a nomenclature type." Rule 17 then adds: "The type determines the application of the name of a taxon if the taxon is subsequently divided or united with another taxon." Therefore, the official prokaryotic nomenclature recognises nomenclature types as a means of naming organisms. The naming hierarchy is then based on the nomenclature types for each rank ascending from the lower to higher categories (Rule 15). Thus, the names of the highest categories up to the rank of class are based upon the designation of type genus, and derived from the stem of its name by addition of the corresponding rank suffixes (Rule 8 and 9). To date, all described uncultured prokaryotes have been named without designation of formal nomenclature types. Some studies aimed to designate type species for higher taxa [2,35] but many uncultured lineages are only named at the phylum-level [1,3,8,14,17]. This situation has arisen because neither the rank of phylum nor uncultured prokaryotes are recognised in the official nomenclature [32]. A recent call to formalise the rank of phylum under the Code provision ( [28]; addendum by Whitman et al. [40]) will mean that phylum names will be formed by the addition of the suffix -ota to the name of one of the subordinate classes. As the name of the latter is based on the type genus of one of the contained orders, one will only need to name a genus to provide the basis for all higher taxa names. Currently, this is not be possible for uncultured prokaryotes because only validly published names can serve as the nomenclature types as stated in Rule 20a. However, the recent ability to both genotypically and phenotypically characterize uncultured prokaryotes at the species level has led to the proposal to expand type material for naming prokaryotes to genomic sequences, including those derived from metagenomic or single cell datasets [22,39]. In this instance, official nomenclature practice could be applied to name uncultured microorganisms that could form the basis of a parallel nomenclature system or become part of a unified Code [22,39]. We support the latter proposal as a unified nomenclature will ultimately be the most beneficial outcome for taxonomic classification and communication of results (see also [27]). An important consideration for proposing genome-based type material is the quality of the assembled genome sequence, which must reliably discriminate a new taxon from others, as is the case for cultured isolates. Below we discuss quality standards of genomes to be used as type material and propose type species for four candidate phyla based on high quality MAGs.

Genome type material quality considerations
Quality criteria for genomes that will serve as type material are important for taxonomy and nomenclature as the designated type must unambiguously identify the taxon (Section 4 of the Code; [11,39]). Low quality type genomes, e.g. highly incomplete, chimeric assemblies, or missing marker genes, could lead to misclassifications and unnecessary creation of names, thus contradicting Principle 1 of the Code. A number of publications have proposed genome quality standards for isolate genomes [9,11,25] and genomes from uncultured microorganisms [7], including one of the two recent proposals specific to their use as type material for description of uncultured taxa [22]. The 'high-quality draft' MIMAG/SAG (Minimum Information about a metagenomeassembled genome/single amplified genome) standard uses a combination of estimated completeness (>90%) and contamination (<5%) with nearly complete complements of rRNAs (23S, 16S, 5S) and tRNAs (≥18) as defining criteria [7], which we suggest should serve as a benchmark for MAG type material with the following caveats. Firstly, completeness estimates should be considered in the context of genome sequence evolution. In particular the completeness of small streamlined genomes such as those belonging to parasitic or symbiotic microorganisms can be underestimated due to bona fide absence of marker genes [2,5,29]. In these cases, the number of contigs may be a better indication of whether a given genome is suitable as type material, with priority given to genomes with fewer contigs. Secondly, given the importance of a complete or nearly complete 16S rRNA gene for classifying microorganisms [11,22], care should be taken to confirm that this gene belongs to the MAG, especially if it occurs in isolation on a small contig. Thirdly, although the presence of tRNAs encoding at least 18 of the 20 standard amino acids is desirable, the correlation between the number of tRNAs and genome completeness is weak, thus, reflecting its poor suitability for estimating genome completeness [30]. Furthermore, domain-specific features such as presence of fragmented or split tRNAs in Archaea [16] can result in underestimation of the true number of encoded tRNAs. Therefore, we suggest that, while this criterion should still be reported, it should not be mandatory when considering a MAG for use as type material.
Case studies of genome-based type material for uncultured prokaryotes Major bacterial and archaeal lineages currently lacking cultured representatives were reviewed to select examples suitable for proposal of type material based on MAGs. For delineation and annotation of higher and lower ranks including species and genera, we used a recently proposed standardised genome-based taxonomy (GTDB; [31]). Four MAGs representing two archaeal and two bacterial phyla were identified as suitable candidates for type material as they conform to the MIMAG criteria for high-quality draft as outlined in Bowers et al. [7] with the caveats discussed above. We propose four species based on MAG type material ( Table 1). The corresponding quality metrics of each MAG are summarised in Supplementary Table 1. Below we describe how the essential rules used in official nomenclature practice can be applied to uncultured prokaryotes through the proposed use of MAG type material from simple to more complex case studies.

Case 1: proposal of Candidatus Binatus soli
Representatives of several novel bacterial and archaeal phyla have been recovered recently from metagenomic data deposited in public repositories [30]. Among those, the first genomic representatives of candidate bacterial phylum UBP10 were reconstructed from a Based on the pending proposal by Whitman [40]. b Based on the pending proposal by Oren et al. [28], addendum Whitman et al. [40]. a soil metagenome. As of today, UBP10 appears to be represented by a single class as annotated in GTDB, including MAGs identified in other studies. Here we propose Candidatus Binatus soli based on the high-quality MAG, UBA7539 (GCA 002479255.1) to represent the first named species of the UBP10 phylum. This species can serve as the type for higher-ranking taxa including phylum if the Oren et al. [28] proposal is accepted, which would be named: Binataceae (family), Binatales (order), Binatia (class), and Binatota (phylum).

Case 2: proposal of Candidatus Hydrothermarchaeum profundi
Candidate archaeal phylum Hydrothermarchaeota was recently proposed to accommodate the first genomic representatives of Marine Benthic Group E (MBG-E) [19], which was previously identified via 16S rRNA gene sequences [37]. Although three MAGs representing this phylum were recovered, none were proposed as Candidatus species or genera. Of these, JdFR-18 (GCA 002011125.1) is the highest quality MAG meeting the requirements described above. We therefore propose JdFR-18 as the type material of a new species Candidatus Hydrothermarchaeum profundi. The generic name was formed from the stem of the originally proposed higher taxon name as the latter is not validly published and could only be valid under the Code with the designation of its nomenclature type. However, with the proposal of Ca. H. profundi, stability of the phylum name can be achieved as follows: Hydrothermarchaeaceae (family), Hydrothermarchaeales (order), Hydrothermarchaeia (class), and Hydrothermarchaeota (phylum). Note that the originally proposed phylum name 'Hydrothermarchaeota', will need to be re-proposed as the phylum containing the genus Ca. Hydrothermarchaeum to be consistent with the Code.

Case 3: proposal of Candidatus Hadarchaeum yellowstonense
Originally proposed as a class-level lineage within the archaeal phylum Euryarchaeota [4] and later as a member of the superclass Stygia [1], the class Hadesarchaea (formerly known as SAGMEG [South-African Gold Mine Miscellaneous Euryarchaeal Group]) is currently considered a separate phylum based on taxonomic rank normalisation [31]. No named species or genera have been proposed for the Hadesarchaea, therefore, this class cannot be validated according to the Code (assuming unified rules or recent proposals). Here we propose Candidatus Hadarchaeum yellowstonense based on the highest quality MAG available, YNP 45 (GCA 001515205.2) as the first named species within the lineage that can serve as the type for naming higher taxa containing this species: Hadarchaeaceae (family), Hadarchaeales (order), Hadarchaeia (class), and Hadarchaeota (phylum).

Case 4: proposal of Candidatus Hydrothermus pacificus
Candidate bacterial phylum Hydrothermae was recently proposed to accommodate the first genomic representatives of the EM3 lineage [19], which was previously known only from 16S rRNA sequences [34]. According to taxonomic rank normalization [31], the Hydrothermae are a class within a larger monophyletic unit comprising candidate phylum WOR-3 [4], which itself has been proposed to be named Candidatus Stahlbacteria after Dr. David Stahl, an environmental microbiologist and early proponent of 16S rRNA-based microbial ecology [13]. However, both Hydrothermae and Candidatus Stahlbacteria are examples of high level lineages that lack designation of lower ranks and nomenclature types. Here we propose Candidatus Hydrothermus pacificus as the type and first named species of the group based on the highest quality MAG currently available, JdFR-72 (GCA 002011615.1) to conserve the link to the name Hydrothermae [19]. The Hydrothermae (EM3) lineage comprising this species could then use the root of the genus name for higher ranks: Hydrothermaceae (family), Hydrothermales (order), Hydrothermia (class) and Hydrothermota (phylum). Similarly, if a type species is designated based on a high quality MAG in the Candidatus Stahlbacteria (WOR-3) lineage using the original etymology (e.g. named after David Stahl as genus Stahlia or Stahlibacter), the name and associated recognition can be salvaged without leading to nomenclatural discord as ensured by the use of type material. If both lineages have valid classes, either can then be used to form the phylum name.

Concluding remarks
A substantial proportion of bacterial and archaeal diversity has been described solely using culture-independent techniques, including recovery of near complete genomes from metagenomic data [1,3,8,14,43]. However communication of this diversity is often complicated by uncultured prokaryotes not being recognised in the official nomenclature. As a result, a number of inconsistencies exist between the way cultured and uncultured taxa are named. Here we demonstrate how some of these common inconsistencies, most notably the failure to designate nomenclatural types for higher taxa, can be addressed and provide examples for two archaeal and two bacterial phyla. This is a necessary first step for candidate taxa names to obtain standing in nomenclature and priority if the nomenclatures of cultured and uncultured prokaryotes are unified following pending proposals [28,39]. Names proposed in this manner can be considered for inclusion in future validation lists. We strongly encourage authors to name at least one species and genus within any newly delineated uncultured lineage based on MAGs, with due consideration of genome quality criteria. Since the phylogenetic position of taxa including newly discovered lineages may be uncertain or disputed [31], designating type material ensures name stability with changing taxonomic opinion.
One important further consideration is that improved genomic representatives may become available for already defined type material. Recovery of improved MAG or SAG representatives of type species is highly likely given that rapid advances in sequencing and bioinformatic techniques continue to be made [6,10,20]. Improved sequences may be helpful for making taxonomic decisions within the group. However, they should have no influence on designation of the type material. To do so only leads to nomenclatural instability by making priority conditional rather than absolute. The reduced genome size and previously inferred gene content (821) suggests that the genome has undergone streamlining. The inferred metabolic capabilities indicates oxidation of carbon monoxide, which may be coupled to H2O or nitrite reduction to ammonia. Also inferred to contain a variety of central carbon metabolic (C1 pathway) genes found in methanogens, which may be used for carbon fixation. The organism is inferred to be thermophilic. Improvements are a taxonomic and not a nomenclatural issue. Moreover, if sequences are shown to possess errors that preclude their ability to truthfully represent a biological entity, procedures within the Code already allow designation of a neotype. Likewise, if a representative of a taxon is subsequently cultured, Rule 18f allows the culture to be designated as the type. However, this is quite different from making incremental changes in the type sequence, which ultimately may change the original intention and is, in fact, unnecessary. We also suggest that genomes designated as types should be recognised by public repositories such as those within The International Nucleotide Sequence Database Collaboration (INDSC) and include a designation such as 'type genome' to facilitate comparison to other taxa.