Proposal of Carbonactinosporaceae fam. nov. within the class Actinomycetia. Reclassification of Streptomyces thermoautotrophicus as Carbonactinospora thermoautotrophica gen. nov., comb. nov.

Streptomyces thermoautotrophicus UBT1T has been suggested to merit generic status due to its phylogenetic placement and distinctive phenotypes among Actinomycetia. To evaluate whether 'S. thermoautotrophicus' represents a higher taxonomic rank, 'S. thermoautotrophicus' strains UBT1T and H1 were compared to Actinomycetia using 16S rRNA gene sequences and comparative genome analyses. The UBT1T and H1 genomes each contain at least two different 16S rRNA sequences, which are closely related to those of Acidothermus cellulolyticus (order Acidothermales). In multigene-based phylogenomic trees, UBT1T and H1 typically formed a sister group to the Streptosporangiales-Acidothermales clade. The Average Amino Acid Identity, Percentage of Conserved Proteins, and whole-genome Average Nucleotide Identity (Alignment Fraction) values were ≤58.5%, ≤48%, ≤75.5% (0.3) between 'S. thermoautotrophicus' and Streptosporangiales members, all below the respective thresholds for delineating genera. The values for genomics comparisons between strains UBT1T and H1 with Acidothermales, as well as members of the genus Streptomyces, were even lower. A review of the 'S. thermoautotrophicus' proteomic profiles and KEGG orthology demonstrated that UBT1T and H1 present pronounced differences, both tested and predicted, in phenotypic and chemotaxonomic characteristics compared to its sister clades and Streptomyces. The distinct phylogenetic position and the combination of genotypic and phenotypic characteristics justify the proposal of Carbonactinospora gen. nov., with the type species Carbonactinospora thermoautotrophica comb. nov. (type strain UBT1T, = DSM 100163T = KCTC 49540T) belonging to Carbonactinosporaceae fam. nov. within Actinomycetia.


Introduction
The genus Streptomyces Waksman and Henrici 1943, belonging to order Streptomycetales, family Streptomycetaceae within the class Actinomycetia (former Actinobacteria [1]), is one of largest bacterial genera as currently defined, with more than 600 validly named species according to the List of Prokaryotic names with Standing in Nomenclature (https://lpsn.dsmz.de/genus/streptomyces). Members of the genus are highly significant because of their complex lifecycles, including sporulation, which have made them important as model organisms for studies of bacterial genetics and ecology [2][3][4]; and because members of the genus are highly proficient producers of secondary metabolites of biomedical and biotechnological interest, notably antibiotics and anticancer compounds [5][6][7]. Given the size of the genus Streptomyces, attempts have been made to determine internal structure of the genus through phylogenetic characterization of species groups, and its relationships to other taxa within the family Streptomycetaceae [8][9][10][11], although more comprehensive phylogenomic studies are required to resolve interspecies and suprageneric structure [6,11].
Genomic metrics have become the gold standard for defining taxonomic ranks, especially species designations among prokaryotes as they provide a reproducible, reliable, and a highly informative means to infer relatedness directly between genomes sequences [12,13]. In particular, average nucleotide identity (ANI) using the BLASTn algorithm (ANIb) and the Genome BLAST Distance Phylogeny (GBDP)-based digital DDH (dDDH) methods have been widely used to determine species boundaries and confirm identification [12,14,15]. The use of phylogenetic analyses in addition to Average Amino acid Identity (AAI), Percentage of Conserved Proteins (POCP), and whole-genome ANI (gANI) coupled with Alignment Fractions (AF) metrics have been also proposed to demarcate genus and higher taxa [13,[16][17][18].
Streptomyces thermoautotrophicus Gadkari et al. 1991 has been suggested to merit generic status since it does not cluster within the sensu stricto Streptomyces clade in a tree inferred with the GBDP using formula d5 [11] and was located apart from the clade containing members of the family Streptomycetaceae in a phylogeny from 14 well-conserved proteins [19]. This species was described by Gadkari et al. [20] based on characteristics of a single strain, UBT1 T , isolated from soil covering a charcoal burning pile. Strain UBT1 T is of interest as a sporulating aerobic thermophile, exhibiting growth at 40-68°C, likely reflecting its isolation source. It was claimed to be a CO-and H 2 -oxidizing obligate chemolithoautotrophic bacterium [20] and to produce a biochemically distinct, oxygen insensitive nitrogenase [21,22]. Later, MacKellar et al. [19] isolated a second 'S. thermoautotrophicus' strain, H1, from another burning charcoal pile near an active coal seam fire. Multiple CO dehydrogenase gene clusters have been identified in the genomes of strains UBT1 T and H1; however, genes encoding nitrogenase enzymes seem to be absent. In addition, strains H1 and UBT1 T were unable to grow on Noble agar or to incorporate 15 N 2 into biomass, besides growing heterotrophically on pyruvate.
As a result, MacKellar et al. [19] proposed the reclassification of 'S. thermoautotrophicus' as non-diazotrophic, facultative chemolithoautotrophic bacteria. Nevertheless, the chemolithoautotrophic metabolism of 'S. thermoautotrophicus' distinguishes it from members of the genus Streptomyces [11]. In addition, the presence of eleven (UBT1 T ) or nine (H1) biosynthetic gene clusters for secondary metabolites in the 'S. thermoautotrophicus' genomes [19] is relatively low compared to other members of Streptomyces [5], reflecting the comparatively small genome sizes (~5 Mb) of these two strains. Finally, the circularity of the H1 genome distinguishes it from Streptomyces sensu stricto, where most genomes are linear [23].
Although earlier studies have presented convincing evidence that 'S. thermoautotrophicus' should be reclassified into a novel genus [11,19], formal taxonomic proposals to achieve this have not been made due to concerns about the type strain availability and ambiguity concerning its suprageneric relationships. Here, we revisit the taxonomy of 'S. thermoautotrophicus' and propose its reclassification as Carbonactinospora thermoautotrophica gen. nov., comb. nov., within the Carbonactinosporaceae fam. nov.

16S rRNA sequence identity analysis
The 16S rRNA gene sequence data for the type strains within the class Actinomycetia were retrieved from the SILVA SSU r138.1 database [24] (link to the full license: https://creativecommons.org/licenses/by/4.0/legalcode). The search criteria in the Living Tree Project (LTP) dataset were set as follows: ''Actinobacteria" in the taxonomy field, sequence length >1400 nucleotides, sequence quality >90, and type strains (search term ''[T]" in the strain field). Sequences were downloaded as an alignment in FASTA format containing gaps. As in the SILVA database, Acidothermus cellulolyticus is classified within the order Frankiales [25], it was manually corrected to Acidothermales.
The 16S rRNA genes sequences of strains 'S. thermoautotrophicus' UBT1 T [20] and H1 [19] were extracted from the RefSeq genome assemblies GCF_001543895 and GCF_001543925, respectively, and subsequently aligned using SINA 1.2.11 [26]. A consensus alignment between the genomic 'S. thermoautotrophicus' and the SILVA alignments was obtained. Positions containing gaps were removed from the alignment and an identity matrix for the resulting 858 positions was computed using Bioedit v. 7.0.5.3. The Python library Seaborn v. 0.11.1 was utilized for building boxplots of the16S rRNA identities for each order within the class Actinomycetia. Additionally, sequence identity was assessed by comparing the 16S rRNA sequences of the UBT1 T and H1 strains with the sequences from EzBioCloud [27], a quality-controlled 16S rRNA server database.

Phylogenetic analyses
Genera within the class Actinomycetia were identified in the lineage file available in https://github.com/zyxue/ncbitax2lin (v. 2019-02-20), which was generated from the NCBI taxonomy dump. Subsequently, the type species of each genus were retrieved according to the names provided on LPSN using the script ''get_t ype_genus.py" (available at https://github.com/fhsantanna/ bioinfo_scripts). All the available proteomes for Actinomycetia type species in the NCBI Assembly RefSeq database were downloaded for further analyses. The proteomes of the Embleya and Trebonia type species were later included manually since these genera were not available in the lineage file. Lastly, the proteome of UBT1 T and H1 were included in the final sequence dataset.
Two different approaches were carried out for the phylogenetic reconstruction based on the concatenated alignment of orthologous proteins. The first approach utilized the AMPHORA2 [28] pipeline for the identification of universal taxonomic markers in the Actinomycetia proteomes. For this purpose, the ''phylogenomics-tools" scripts were utilized [29]. The markers dnaG, infC, nusA, pgk, pyrG, rplK, rpoB, rpsC, and smpB were excluded from the analyses as they were present in either multiple copies or at low representation among the type species. A total of 253 taxa After, each marker protein was aligned using MUSCLE [30] v. 3.8.31 and concatenated. Positions containing gaps were excluded and the final 3184 amino acids alignment was utilized as input for the phylogenetic reconstruction based on the Maximum Likelihood (ML) method in the PhyML 3.0 server [31]. The substitution model was selected based on the Akaike Information Criterion to select LG + G + I as the best model, with an estimated gamma shape parameter of 0.829 and an estimated proportion of invariable sites of 0.165. Branch support was assessed using aLRT SH-like [32]. The second approach was a protein-based core genome phylogeny using a de novo identification of phylogenetic markers. Core ortholog groups of the previously selected strains were identified using bidirectional best hits (BBHs) algorithm implemented in GET_HOMOLOGUES [33] pipeline build 31072020, excluding inparalogs and using minimal blast searches. Once the core proteins were identified, GET_PHYLOMARKERS [34] v. 2.2.8.1 was used with default parameters (-R 1 -t PROT options) for finding optimal ortholog clusters for phylogenomic reconstruction. This approach is based on three main filters: exclusion of alignments containing recombinant sequences, removal of reconstructions that deviate from expectations of the multispecies coalescent, and elimination of poorly resolved gene trees. Top-scoring gene alignments were concatenated into a supermatrix, which was utilized to estimate the species-tree with the ML method. The phylogenetic trees were processed with Newick utilities [35], whose functionalities include taxa renaming and tree pruning (i.e. removing clades and only keeping those of interest).
In order to obtain a genome tree using GBDP, the genome sequence data were uploaded to TYGS, the Type (Strain) Genome Server [36]. In brief, the determination of closest type strain genomes was done in two complementary ways: first, the UBT1 T and H1 genomes were compared against all type strain genomes available in the TYGS database via the MASH algorithm, a fast approximation of intergenomic relatedness [37], and, then the ten type strains with the smallest MASH distances were chosen for each 'S. thermoautotrophicus' genome. Second, an additional set of ten closely related type strains was determined via the 16S rDNA gene sequences. These were extracted from UBT1 T and H1 genomes using RNAmmer [38] and each sequence was subsequently BLAST searched [39] against the 16S rDNA gene sequence of each of the currently 13,011 type strains available in the TYGS database. This was used as a proxy to find the best 50 matching type strains (according to the bitscore) for each 'S. thermoautotrophicus' genome and to subsequently calculate precise distances using the GBDP approach under the algorithm 'coverage' and distance formula d5 [40]. For the calculation, local-alignment programs are used to align a genome X against a genome Y, and vice versa, producing a set of high-scoring segment pairs (HSPs). These matches are then transformed to a single distance value d(X, Y) by applying the formula d5, which is calculated as two times the sum of identical base pairs over all HSPs (2 Â I xY ) divided by the total length of all HSPs found in both genomes (H xY + H YX ) [41,42], rescaled for phylogenetic inference and with branch support values based on resampling [43].These distances were finally used to determine the 10 closest type strain genomes for each of the 'S. thermoautotrophicus' genomes. For the GBDP tree reconstruction, all pairwise comparisons among the set of genomes were conducted using GBDP under the algorithm 'trimming' and distance formula d5. The resulting distances were used to infer a balanced minimum evolution tree with branch support via FASTME 2.1.4 including SPR postprocessing [44]. Branch support was inferred from 100 pseudo-bootstrap replicates each. The trees were rooted at the midpoint [45] and visualized with PhyD3 [46].

Genome and proteomic metrics
Proteomic and genomic relatedness metrics were computed comparing 'S. thermoautotrophicus' to type species from the order Streptosporangiales, and the genera Acidothermus, Catenulispora, Frankia, Micromonospora, Pseudonocardia, Sporichthya and Streptomyces.
gANI and AF values were obtained by the Microbial Species Identifier (MiSI) method using ANIcalculator 2014-127 v. 1.0 (https://ani.jgi.doe.gov/html/home.php?page=introduction). gANI is calculated for a pair of genomes by averaging the nucleotide identity of orthologous genes identified as BBHs, which are the genes that show !70% sequence identity and !70% alignment of the shorter gene. AF is calculated as a fraction of the sum of the lengths of BBH genes divided by the sum of the lengths of all genes in a genome [47].
POCP values were obtained with the script ''POCP.sh" (available at https://figshare.com/articles/POCP_calculation_for_two_genomes/4577953/1), which was written based on Quin et al. [17]. For POCP calculation, the conserved proteins between a pair of genomes are determined by aligning all the protein sequences of a genome X against a protein's sequences from a genome Y, using the BLASTP aligner. Conserved proteins are defined as presenting a match with an <1e À5 E value, > 40% of sequence identity, and >50% of an alignable region of the query protein sequence. The POCP (X, Y) % is calculated as [(C x + C y )/(T x + T y )] Â 100, where C represents the conserved number of proteins and T represents the total number of proteins on the respective genome.
AAI analyses were performed using the script ''aai.rb" implemented in the Enveomics Collection [48]. For AAI calculation, the conserved genes between a pair of genomes are determined by aligning all protein-coding sequences (CDSs) of a genome X against a translated database of genome Y, using the TBLASTN aligner. Conserved CDSs are defined as presenting >30% of sequence identity at the amino acid level and >70% of an alignable region of the query CDS sequence. The matching segment from the genomic sequence is extracted and the reverse search with BLASTX is used to determine the presumably orthologous fraction of conserved genes between the two genomes (two-way BLAST). The two-way AAI (X, Y) % is measured by the average amino acid identity of all two-way BLAST conserved genes between the genomes, as computed by the BLAST algorithm [49]. For evaluating the AAI diversity between 'S. thermoautotrophicus', Streptomyces, and Streptomycetales type strains, all available proteomes from these taxa were utilized for AAI computation as described above. As a control, Streptomyces albus, the type species of the genus Streptomyces, was compared to the same taxa.
Scatter plots showing the relationship between AAI and POCP and between AAI and AF were generated using the Python library Seaborn v. 0.11.1.
Taxonomic profiling of proteomes AAI-profiler, which is a webserver dedicated to taxonomic identification [50], was employed to perform proteome-wide sequence searches using 'S. thermoautotrophicus' UBT1 T and H1 genomes. AAI-profiler computes AAI between a query proteome and all target species in the UniProt database [50]. Each protein is binned considering the taxonomic attribution of the closest counterpart in the database. A taxonomic profile of the proteome of interest is built considering the counts of the target taxa, and these frequencies are weighted by the percent identity of the match to the query. Given that 'S. thermoautotrophicus' is already included in the AAI-profiler database, the taxonomic profile excluded hits of the top-ranked taxon, which are from 'S. thermoautotrophicus' itself.

Results and discussion
Diversity of the 16S rRNA genes from 'S. thermoautotrophicus' To evaluate the taxonomic position of 'S. thermoautotrophicus' within Actinomycetia, we first conducted a 16S RNA gene identity sequence analysis of strains UBT1 T and H1. As previously reported [19], the genome of UBT1 T contains three 16S rRNA genes, two of which are identical to each other (locus tags TH66_RS04095 and TH66_RS03010) whilst the other is divergent (TH66_RS22860), presenting 94% identity to the other two. The genome assembly of H1 contains two 16S rRNA genes, one of which (LI90_RS08655) is identical to the TH66_RS04095/TH66_RS03010 pair, and the other (LI90_RS18525) is identical to the divergent copy TH66_RS22860.
The presence of multiple 16S RNA gene copies within a single bacterial genome has been observed before. Indeed, bacteria can harbour more than 20 copies of this marker gene [52]. The presence of intragenomic heterogeneity of 16S rRNA !6% was also reported in some thermophiles, such as the Firmicutes members Desulfotomaculum kuznetsovii DSM 6115 T and Thermoanaerobacter tengcongensis MB4 T , and the Actinobacteria member Thermobispora bispora DSM 43833 T [53,54]. This may constitute an ecological strategy [55][56][57] to adapt the bacterial cellular machinery to perform under different temperatures [58], with different copies being functional under different environmental conditions [59]. In addition to the biases introduced from PCR [60,61], the presence of multiple different 16S rRNA gene copies is another strong argument against relying only on 16S rRNA gene phylogeny in species delineation in traditional polyphasic approach.
To identify 16S rRNA gene relatedness at the genus level, each copy from UBT1 T and H1 was compared to 2,792 16S rRNA sequences from type strains of Actinomycetia species available in the SILVA database. According to this analysis, TH66_RS04095/T H66_RS03010/LI90_RS08655exhibit identities above the 94.5%   genus circumscription threshold [62] with 16S rRNA sequences from 87 non-Streptomyces and six Streptomyces type strains (not including the type species Streptomyces albus), being closely related to A. cellulolyticus from the order Acidothermales with 96.7% of identity (Fig. 1A). Sequences from representatives of Acidothermales, Frankiales, and Micromonosporales exhibit the highest identities to TH66_RS04095/TH66_RS03010/LI90_ RS08655. The more divergent TH66_RS22860/LI90_RS18525 copies did not belong to any recognized phylotypes at the genus level when compared with sequences from the Actinomycetia dataset (Fig. 1B), and even with the 65,797 entries in the EzBioCloud 16S rRNA database (Table S1). In both analyses, A. cellulolyticus stood out in presenting 93.7% identity to these divergent copies.
The current understanding of the evolutionary forces shaping the genomes of Actinomycetia is limited [63]; however, McDonald and Currie [64] analyzed 122 Streptomyces genomes and found that the acquisition and retention of genes through horizontal gene transfer (HGT) are surprisingly rare in this genus. Considering these findings, one of these 16S rRNA sequences can be assumed to be ancestral to strains UBT1 T and H1 while the other in each genome seems to be the product of a more recent duplication, rather than an HGT event.

Phylogenetic placement of 'S. thermoautotrophicus' within Actinomycetia
A multigene-based phylogenetic approach should be the choice for defining genera or higher taxa according to the minimal standards for the use of genome data for the taxonomy of prokaryotes [13]. Thus, to identify the current closest relatives to 'S. thermoautotrophicus' UBT1 T and H1, two different approaches were employed to reconstruct the evolutionary history of UBT1 T , H1 and an additional set of 251 type species of Actinomycetia with genomes/proteomes available. We first reconstructed a ML phylogenetic tree with the concatenated protein sequences from 22 conserved single-copy genes identified in the assemblies with the AMPHORA2 pipeline. In addition, we performed a de novo approach for the identification of nine ortholog genes/ubiquitous proteins in the Actinomycetia type species genomes which were appropriate for phylogenomic analysis, with only three of them, encoding proteins of the 50S ribosomal subunit, also present in the AMPHORA2 dataset. Both phylogenetic reconstructions ( Fig. 2 and Figs. S1-S3) infer that the genus Streptomyces does not form a clade with UBT1 T and H1, and the latter strains share a last common ancestor with A. cellulolyticus and members of the Streptosporangiales clade, thus belonging to a deeply branching lineage.
In the previous phylogenomic analysis that included 'S. thermoautotrophicus', MacKellar et al. [19] highlighted the unusual position of the UBT1 T and H1 genomes as being closely related to Acidothermus and Streptosporangiales (Streptosporangium, Thermobifida, Thermobispora, and Thermomonospora), and distinct from the clade containing the families Streptomycetaceae and Catenulisporaceae. Therefore, the authors proposed that UBT1 T and H1 do not belong to the genus Streptomyces and instead are nearer to families including Acidothermaceae and Streptosporangiaceae. The proposal of a generic status for 'S. thermoautotrophicus' was also supported by Nouioui et al. [11], in a tree inferred with GBDP formula d5 [11], where UBT1 T branched away from core Streptomyces before Kitasatospora and Streptacidiphilus, forming a sister group to the core Streptomyces-Kitasatospora-Streptacidiphilus clade. The position inferred by Nouioui et al. [11], however, conflicts with MacKellar et al. [19] and our phylogenetic reconstructions based on ML estimations (Fig. 2), where UBT1 T forms a sister group with Acidothermus and members of Streptosporangiales.
To obtain a current GBDP tree for 'S. thermoautotrophicus', the genome sequence data for UBT1 T and H1 were uploaded to TYGS (Fig. 3). The distance-based tree demonstrated that UBT1 T and H1 form a distinct group of Actinomycetia, however, due to the low branch support, the sister groups for the 'S. thermoautotrophicus' strains could not be delimited precisely. The phenetics or distance-based approaches, such as GBDP, try to fit a tree to a matrix of pairwise genetic distances, therefore, reflecting the number of nucleotide or amino-acid substitutions [65]. In contrast, phylogenetic approaches measure distances based on variation in the nucleotide or amino acid sequences at each site, or the presence or absence of indels, upon an implicit or explicit mathematical model describing the evolution, namely, Bayesian and ML approaches [66]. As exemplified here, the occurrence of incongruence among different tree reconstruction methods are well-known [67,68]. However, we note that the Genome Taxonomy Database (GTDB, release 06-RS202) tree places 'S. thermoautotrophicus' in the order Streptomycetales, thus being congruent with the GBDP tree (Fig. S4). The GTDB approach is based on genome trees inferred with FastTree from an aligned concatenated set of up to 120 single copy marker proteins tree [69,70]. According to the phylogenies demonstrated here, strains UBT1 T and H1 have a distinct phylogenetic position within the class Actinomycetia, clearly belonging to a novel family. However, further studies are needed to resolve the ambiguity over the placement of the family, which may represent a novel order.

Genus delineation for UBT1 T and H1 using genomic and proteomic metrics
Despite the advancements in resolving species delineation and the use of genome data to reconstruct the phylogenetic relationship of microorganisms, there is no consensus on the incorporation of genomic metrics and cutoffs to demarcate genera and higher taxa. Nevertheless, different metrics that measure proteomic and genomic relatedness to demarcate genera have been proposed on the basis of AAI [16] and POCP [17]. Recently, Barco et al. [18] utilized the MiSI method [47] for genus delineation, and they verified that the gANI and AF mean values for genus inflection points in Bacteria are 73.1% and 0.333, respectively. Thus, we have applied these approaches to evaluate 'S. thermoautotrophicus' UBT1 T and H1, Streptomyces, Acidothermus, and Streptosporangiales genomes in detail within the taxonomic context of genus.
In the comparison of the closely related Actinomycetia to UBT1 T and H1, different Streptosporangiales genomes presented the highest POCP values while some Streptomyces genomes present the highest AAI values (Fig. 4A). According to the AAI measure, Streptomyces megasporus NRRL B-16372 T is a closely related strain to H1 with 59.0% AAI and 45.1% POCP values, while Streptomyces vitaminophilus ATCC 31673 T is closely related to UBT1 T , presenting 58.9% AAI and 44.4% POCP. According to the POCP metric, Thermomonospora catenispora 3-22-3 T (Streptosporangiales) is closely related to both H1 and UBT1 T presenting 46.2 and 48.0% POCP, and 58.3 and 58.5% AAI, respectively. The comparisons of 'S. thermoautotrophicus' with Acidothermus presented even lower values of~57.3% AAI and 39.3% POCP. Nevertheless, none of the obtained values surpassed the recommended 65 to 72% [16] and 50% [17] thresholds for the delineation of genera using AAI and POCP metrics, respectively. As expected, S. albus was unambiguously grouped with Streptomyces sensu stricto, while the comparisons of UBT1 T and H1 strains to Streptosporangiales and Streptomyces appeared to be distinct from S. albus vs Streptosporangiales.
Given the proteomic similarity of some Streptomyces and Streptosporangiales genomes to 'S. thermoautotrophicus', we further explored the proteomic similarity between UBT1 T and H1 to 102 Streptosporangiales and 223 Streptomyces genomes. Comparing Streptomyces species to UBT1 T and H1, respectively, we found an AAI of 57.20 ± 0.49 (% mean ± SD) and 57.15 ± 0.5, and the number of common proteins to be 2471 ± 116 and 2315 ± 104. For Streptosporangiales, we found an AAI value of 55.8 ± 1.5 and 55.6 ± 1.5, and the number of common proteins to be 2403 ± 204 and 2255 ± 184. In this analysis, we also did not find any AAI values ! 65% to the 'S. thermoautotrophicus' strains.
In the gANI(AF) analysis (Fig. 4B), similarly to the POCP vs AAI correlation plot, S. albus was grouped with Streptomyces as expected, while the comparisons of UBT1 T and H1 to Streptosporangiales were intermixed, and the two strains are clearly distinct from Streptomyces. Although some type species from Streptosporangium, Catenulispora, Frankia, Micromonospora, Pseudonocardia, Sporichthya and Streptomyces present gANI values that surpass 73.1% in relation to UBT1 T and H1, these comparisons do not surpass the minimum AF requirement for genus definition i.e. gANI and AF are inconsistent. While gANI represents the identity of orthologous genes identified as BBHs using similarity searches, the AF is a complementary measure of the minimum amount that genomes must overlap [47]. If the homologous regions are short with respect to the total length of the genomes, as might be seen following a HGT event, then ANI values may be high even though the bacteria are distantly related. The comparison of UBT1 T and H1 with A. cellulolyticus presented 73.0% (~0.15) gANI (AF) (Tables  S2 and S3).
The genomic and proteomic metrics results together demonstrated the substantial difference between 'S. thermoautotrophicus' and other Actinomycetia members. Sequences from strains UBT1 T and H1 are clearly below the established cut-off values (gANI-AF: 73.1%-0.333; AAI: 65-72%; POCP: 50%) for defining bacterial genera, strongly suggesting they represent a novel taxon within Actinomycetia. Taxonomic composition of the 'S. thermoautotrophicus' proteomes To evaluate the taxonomic composition of the 'S. thermoautotrophicus' proteomes, we used strain UBT1 T and H1 protein sequences as queries at AAI-profiler for homology searches in the UniProt database. As demonstrated in Fig. 5, Streptomycetales proteins were the top hit for only~36% of the query proteins from strains UBT1 T and H1, while~19% of them matched to Streptosporangiales order proteins. The other query proteins are distributed among different orders of the Actinomycetia.
The apparent mosaic nature of the UBT1 T and H1 genomes reflects the underrepresentation of closely related strains in the public sequence databases rather than HGT. Despite the rapid expansion in number of sequenced bacterial and archaeal genomes in the past decade [27,71,72] along with the number of species names validly published [12], understudied groups are often represented by a single family [73][74][75][76][77], along with a few or no genomes present in nucleotide databases. This bias is evident to A. cellulolyticus, currently the sole species in Acidothermus, the sole genus within Acidothermaceae, a unique family within order Acidothermales [25]. The query proteins from A. cellulolyticus, similarly to UBT1 T and H1, were distributed between many taxonomic groups and there are no Acidothermales counterparts in the databases.
According to this analysis, the UBT1 T and H1 proteomes are unique among other members of Actinomycetia, corroborating the previous phylogenomic and proteomics/genomics metrics results that indicated a distinctive placement for this taxon.

Phenotypic distinctness of 'S. thermoautotrophicus'
The metabolic distinctiveness of 'S. thermoautotrophicus' UBT1 T and H1 was predicted based on genome comparisons with 71 Streptomyces spp. KO profiles available in the KEGG database. Additional discriminative phenotypic properties were retrieved from the literature for closely related Actinomycetia species.
When compared to UBT1 T and H1, 101 KOs were exclusively present among the Streptomyces spp. profiles (Table S4). On the other hand, 136 KOs were exclusively present in the UBT1 T and H1 profiles (Table S5), including a nitrate/nitrite sensor twocomponent system (narXP) and multiple genes related to carbon metabolism, such as ribulose-bisphosphate carboxylase (rbcLS), glucose/mannose-6-phosphate isomerase, phosphoenolpyruvate carboxykinase, PFK 6-phosphofructokinase 1, fructose 1,6-bisphosphate aldolase/phosphatase, fructose-bisphosphate aldolases, classes I and II. Many exclusive KOs and some Non-Homologous Isofunctional Enzymes (NISEs) cases observed between UBT1 T and H1 and other Streptomyces spp. suggest evolutionary divergences in their metabolisms and distant common ancestors. NISEs are evolutionarily unrelated enzymes that catalyze the same biochemical reactions [78]. For example, exclusive KOs for UBT1 T and H1 (K01754) and for other Streptomyces spp. (K01752) are related to the same L-serine ¢ pyruvate + NH 3 enzymatic reaction (R00220) but were exclusively found in each group. While UBT1 T and H1 have some exclusive enzymes, including RuBisCO, related to a carbon autotrophic lifestyle, the other KEGG from Streptomyces spp. showed some exclusive KOs related to a heterotrophic lifestyle, including gluABCD.
The major characteristic that differentiates UBT1 T from Acidothermaceae, Nocardiopsaceae, Streptomycetaceae, Streptosporangiaceae, Thermomonosporaceae, and Treboniaceae is its unique ability to grown chemolithotrophically on CO or CO 2 and H 2 ( Table 1). UBT1 T can also be distinguished from these families based on the discontinuous distribution of chemotaxonomic markers, notably cell wall amino acids, menaquinones, and diagnostic sugars in whole cell hydrolysates, in addition to the presence of spores and colony morphology.

Conclusions
Based on the genetic and phenotypic distinctness presented above, we conclude that the chemolithotrophic strains 'S. thermoautotrophicus' UBT1 T and H1 represent a novel genus, consistent with previous observations [11,19], and for which we propose the name Carbonactinospora thermoautotrophica gen. nov., comb. nov. (Table 2). Our additional phylogenomic analysis indicate that the genus Carbonactinospora should be placed in a novel family, Carbonactinosporaceae fam. nov. In accordance with the current GTDB taxonomy (Fig. S4), the family Carbonactinosporaceae is placed within the order Streptomycetales, but we note that there are ambiguities in phylogenomic analyses (Figs. 2 and 3) that warrant further studies.
Gram-stain positive, mycelium-forming sporulating bacteria. Carbonactinosporaceae represents a distinct Actinomycetia phylogenetic lineage based on multigene-based phylogenetic analyses. The type genus is Carbonactinospora.

Acknowledgement
We thank Dr Imen Nouioui (Leibniz Institute DSMZ, Germany) for arranging deposit of strain UBT1T (=DSM 100163 T ) in the Korean Collection of Type Cultures. We thank Aharon Oren (Hebrew University of Jerusalem, Israel) for his advice on the formation of the Latin names proposed.

Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.syapm.2021.126223.