Biology of archaea from a novel family Cuniculiplasmataceae (Thermoplasmata) ubiquitous in hyperacidic environments

The order Thermoplasmatales (Euryarchaeota) is represented by the most acidophilic organisms known so far that are poorly amenable to cultivation. Earlier culture-independent studies in Iron Mountain (California) pointed at an abundant archaeal group, dubbed ‘G-plasma’. We examined the genomes and physiology of two cultured representatives of a Family Cuniculiplasmataceae, recently isolated from acidic (pH 1–1.5) sites in Spain and UK that are 16S rRNA gene sequence-identical with ‘G-plasma’. Organisms had largest genomes among Thermoplasmatales (1.87–1.94 Mbp), that shared 98.7–98.8% average nucleotide identities between themselves and ‘G-plasma’ and exhibited a high genome conservation even within their genomic islands, despite their remote geographical localisations. Facultatively anaerobic heterotrophs, they possess an ancestral form of A-type terminal oxygen reductase from a distinct parental clade. The lack of complete pathways for biosynthesis of histidine, valine, leucine, isoleucine, lysine and proline pre-determines the reliance on external sources of amino acids and hence the lifestyle of these organisms as scavengers of proteinaceous compounds from surrounding microbial community members. In contrast to earlier metagenomics-based assumptions, isolates were S-layer-deficient, non-motile, non-methylotrophic and devoid of iron-oxidation despite the abundance of methylotrophy substrates and ferrous iron in situ, which underlines the essentiality of experimental validation of bioinformatic predictions.


Central carbohydrate metabolic pathways
Genomic mining predicted di-and oligosaccharides metabolism (maltose and maltodextrin utilization and trehalose biosynthesis), fermentation (acetyl-CoA fermentation to butyrate and butanol biosynthesis), monosaccharides metabolism (Dgluconate and ketogluconates metabolism, D-ribose utilization and mannose metabolism), one-carbon metabolism by tetrahydropterines, organic acids metabolism (glycerate, lactate and methylcitrate cycle) and glycogen metabolism to be credible in S5 and PM4 genomes. However, confirmed activities include peptides fermentation only.
Based on the genomic annotation solely with non-phosphorylating ED and 5P pathways and non-operative glycolytic EM pathway it seems Cuniculiplasma cannot gain ATP from all these pathways and rely exclusively on fermentation of peptides or oxygen respiration.

Resistance to antibiotics and toxic compounds
Life in heavy metal environments has also a genomic reflection in a number of genes encoding for copper homeostasis counterparts, mercuric reductases, arsenic and cobalt-zinc-cadmium resistance. We thus think that based on the genomes annotation the archaeon S5 is more tightly equipped with heavy metal binding proteins such as a mercuric ion reductase (EC 1.16.1.1)/heavy metal reductase, occurred in four copies in contrast to scarce one copies in the PM4. Another difference between these organisms in this context reflected in pools of genes referred to two copies of YHS domain copper/silver-binding protein and copper/silver-transporting P-type ATPase in S5 and only one copy in PM4.

Oxidative stress
The genomes of both organisms harbour three copies of hemerythrin HHE cation binding domain containing proteins (CPM_0233, CPM_0919, CPM_1919 and CSP5_0265, CSP5_0923, CSP5_1987). Hemerythrins are known to be rare in archaea, general functions of these proteins are referred to a binding of oxygen in mechanisms of delivery, sensory or detoxification in prokaryotic organisms or binding of iron or other metals in eukaryotes 4 . One could speculate on any of these functional strategies employed by these archaea, the clustering of the gene encoding peroxiredoxin family protein (CPM_1918 and CSP5_1986) may point at some detoxification potential. Other proteins to cope with stress, such as peroxiredoxins, peroxidases, thioredoxins, rubrerytrin and superoxide dismutase are represented across chromosomes of S5 and PM4.

Protein folding
Protein folding is considered to be important for all organisms and represents special importance for extreme acidophiles. Genes for AAA superfamily ATPase-domain containing proteins were identified in both genomes with four loci in PM4 and five in S5, PPases found to be FKBP-type peptidyl-prolyl cis-trans isomerase (homologous with genes from other Thermoplasmatales) and cyclophilin type peptidyl-prolyl cis-trans isomerase (revealed homology to methanogens counterparts). Similarly to the genome of T. acidophilum, genomes S5 and PM4 harbour gene clusters coding for DnaJ, DnaK and GrpE, a complete Hsp60 system (thermosome subunits  and  and prefoldins A and B) and Hsp20 family proteins. Two genomes also harbour glutaredoxin-related protein/thiol-disulfide isomerase/thioredoxin.

Distribution patterns of arCOGs in Thermoplasmatales
The comparison of core genomes of Cuniculiplasma and all four other genera of Thermoplasmatales (apart from Thermogymnomonas) revealed relatively similar distribution patterns of arCOGs, but certain tendencies could be observed. Cell motility (FC="N") was characteristic for Thermoplasma and Cuniculiplasma only, despite the fact that microscopic images and genome mining did not reveal any flagellar apparatus in Cuniculiplasma 5 . Pili-associated hits were included into this category.
Another observation is related with a higher numbers of arCOGs of D, J, L, O, P, S, T, U and V categories, which may also be explained by the fact that Cuniculiplasma spp. possess largest genomes among Thermoplasmatales characterised so far (Fig. S4). Figure S1. Proportional Venn diagram on protein orthologs shared by the isolates. Orthology analysis of all high-confidence e in silico called proteins with length > 150 AA was performed with OrthoMCL suite using blastp e-value of 10 -5 and grain value (used for orthology graph building) of 2.5 6 . Numbers indicate shared or unique protein clusters or singletons. Area of ellipses and their intersection is proportional to the number of proteins in each group. Diagram was drawn with eulerAPE software 7 . Supplementary Figure S4. Distribution of COGs functional categories among Thermoplasmatales representatives. For each genus, the core COGs of the type species were used for calculations.