Novel haloarchaeon Natrinema thermophila having the highest growth temperature among haloarchaea with a large genome size

Environmental temperature is one of the most important factors for the growth and survival of microorganisms. Here we describe a novel extremely halophilic archaeon (haloarchaea) designated as strain CBA1119T isolated from solar salt. Strain CBA1119T had the highest maximum and optimal growth temperatures (66 °C and 55 °C, respectively) and one of the largest genome sizes among haloarchaea (5.1 Mb). It also had the largest number of strain-specific pan-genome orthologous groups and unique pathways among members of the genus Natrinema in the class Halobacteria. A dendrogram based on the presence/absence of genes and a phylogenetic tree constructed based on OrthoANI values highlighted the particularities of strain CBA1119T as compared to other Natrinema species and other haloarchaea members. The large genome of strain CBA1119T may provide information on genes that confer tolerance to extreme environmental conditions, which may lead to the discovery of other thermophilic strains with potential applications in industrial biotechnology.

this study we describe strain CBA1119 T isolated from solar salt, which has the highest growth temperature and one of the largest genome sizes among all of the haloarchaeal members. We identified and characterized thermophilic strain CBA1119 T and investigated the relationship between two strain-specific features, namely growth temperature and genome size.

Results and Discussion
Polyphasic taxonomic analysis (see Supplementary Information) revealed that strain CBA1119 T belonged to the genus Natrinema and was a novel member of the genus Natrinema. Interestingly, strain CBA1119 T grew at a temperature of 20 °C-66 °C; optimal growth was observed at 50 °C-55 °C. Of the four strains with an optimal growth temperature >50 °C; three belonged to the family Haloferacaceae and one was strain CBA1119 T , which belongs to the family Natrialbaceae (Fig. 1a). The maximum growth temperatures of haloarchaea varied within each family (Fig. 1b). It is worth noting that there were no strains belonging to the family Halococcaceae that grew at temperatures >50 °C, and only those belonging to the family Natrialbaceae had a maximum growth temperature >60 °C, including strain CBA1119 T . The maximum and optimal growth temperatures of strain CBA1119 T were the highest recorded to date among haloarchaea. Environmental temperature underlies the evolution of various biological phenomena such as the density of hydrogen bonds in nucleic acid 20 .  Figure 1. Comparison of the highest optimal (a) and maximum growth temperatures (b), and genome sizes (c) among haloarchaeal species. Strain CBA1119 T has the highest optimal and maximum growth temperature, and the third largest genome size among type strains belonging to haloarchaea. Red circles indicate strain CBA1119 T . General genomic features of strain CBA1119 T were described in Supplementary Information (Supplementary  Tables S2 and S3; Supplementary Fig. S2b). The ways in which the microbial genome is affected by environmental factors can be understood by pan-genome comparisons 21 . The number of pan-genome orthologous groups (POGs) and strain-specific POGs (singletons) were compared among strain CBA1119 T and seven species of the genus Natrinema (Fig. 2). The flower plot showed that strain CBA1119 T had the largest number of singletons among Natrinema species. The number of singletons in strain CBA1119 T was 1.4 times that in Nnm. salaciae JCM 17869 T (which had the second largest number) and four times that in Nnm. altunense AJ2 T (which had the smallest number). The heat map based on gene content also showed that strain CBA1119 T had more exclusive POGs than other related species (Fig. 3). Additionally, each genome within the genus Natrinema had distinct KEGG pathway profiles based on POGs (Table 1). Strain CBA1119 T had specific enzymes listed on the KEGG pathway named propanoate metabolism, geraniol degradation, fatty acid biosynthesis, metabolism and degradation, and valine, leucine and isoleucine degradation, with P values of zero.
For genome size and growth temperature comparisons among haloarchaeal type strains, information on the strains was obtained from the GenBank database and previous studies, and is shown in Supplementary Table S4. Genome size comparison at the class level revealed that most haloarchaea (104/128 species) had a genome ranging between 3.0 and 4.5 Mb in size, with the class Natrialbaceae having the largest average genome size (Fig. 1c). Interestingly, only three strains had a genome size >5 Mb, including strain CBA1119 T . Besides a high growth temperature, strain CBA1119 T had an unusually large genome size. Haloarchaea species with a genome >5 Mb are uncommon; only two such type strains (and three in total) are found in the GenBank database. Strain CBA1119 T had the third largest genome among haloarchaeal type strains (and the fourth among total haloarchaeal strains). Genome size was shown to be related to COG categories and pathways in bacteria; COG categories related to secondary metabolism and energy conversion were more highly represented in larger genomes, as were KEGG categories related to various cellular processes and metabolism with the exception of nucleotide metabolism 22 . Free-living bacteria with a genome size >6 Mb such as Bacteroides thetaiotaomicron and Streptomyces avermitili can grow in various environments and use a wide range of substrates for energy production. Thus, strain CBA1119 T with its large genome size may be capable of growing under different conditions, and can potentially utilize different substrates to produce energy. Genome size increases with the level of environmental instability; that is, large genomes are also more resistant to environmental perturbations than smaller ones 23 . It remains to be determined whether this applies to strain CBA1119 T . Clarifying the genomic and environmental factors that affect growth temperature and genome size can provide insight into environment-microbe interactions and evolutionary adaptations of various microorganisms, while additional studies on the enzymes of strain CBA1119 T can reveal new tools for industrial biotechnology applications.      Table 1. Strain-specific POGs listed on the KEGG pathway (P < 0.05). * Strain CBA1119 T is estimated to contain the largest number of singletons.
as the hydrolysis of starch and casein 27 and of Tween 40 and Tween 80 28 were evaluated according to established protocols. Antibiotic susceptibility was tested on DBCM2 medium using antibiotic discs with ampicillin (10 μg per disc), erythromycin (15 μg), gentamicin (10 μg), kanamycin (30 μg), nalidixic acid (30 μg), rifampicin (10 μg), and streptomycin (10 μg). The effectiveness of various substrates as a sole carbon and energy source and acid production were determined in HMD medium 29 . A total of 20 carbon sources were tested: D-fructose, D-galactose, D-mannitol, D-mannose, D-sorbitol, D-xylose, fumarate, glycerol, maltose, pyruvate, starch, succinate, sucrose, L-alanine, L-arginine, L-aspartate, L-glutamate, L-lysine, L-malate, and L-sorbose. Polar lipids from strain CBA1119 T were extracted, analyzed, and compared with those of the three reference strains as previously described 30 . The DNA-DNA hybridization (DDH) 31 was performed to determine the genetic relationship between strain CBA1119 T and the three reference strains. To determine the taxonomic identity based on 16S rRNA gene sequence, chromosomal DNA was extracted using a commercial DNA extraction kit (iNtRON Biotechnology, Sungnam, Korea) and the 16S rRNA gene was amplified using PCR PreMix (iNtRON Biotechnology) with universal primers 0018 F and 1518R 32 . Amplified 16S rRNA PCR products were sequenced and assembled as previously described 33 and 16S rRNA sequences were compared using EzTaxon-e 34 or NCBI BLAST 35 . Phylogenetic trees were constructed based on the three 16S rRNA gene sequences of strain CBA1119 T obtained from the genome sequencing data (see below) and other related species using MEGA6 software 36 . Phylogenetic trees were generated with neighbor-joining (NJ) 37 , maximum likelihood (ML) 38 , and maximum parsimony (MP) 39 methods with 1 000 bootstrap replications based on the NJ tree.
Library preparation, sequencing, genome assembly, and annotation. To clarify the relationship between physiological characteristics (especially capacity for growth at high temperatures) and genomic features, we performed genome sequencing of strain CBA1119 T and Nnm. ejinorense JCM 13890 T as previously described 40  Comparative genomic analysis. For genomic comparisons, Natrinema species genomes were obtained from the NCBI genome database, except those of strains CBA1119 T and JCM 13890 T , which were sequenced as described above. The OrthoANI algorithm was used to analyze the genomic relatedness between strain CBA1119 T and other species. OrthoANI percentages were calculated and a phylogenetic tree was constructed 44 . Orthologs in strain CBA1119 T and the reference strains were predicted and mapped using the reciprocal best hit method in UBLAST 45 . Pan-genome orthologous groups (POGs) were estimated using the EzBioCloud Comparative Genomics Database (http://cg.ezbiocloud.net/) 46 , and their presence was calculated using the Jaccard coefficient. The unweighted pair-group method with arithmetic mean (UPGMA) clustering was then used to assess clustering between strain CBA1119 T and the reference strains from a dendrogram constructed based on the presence or absence of gene content. Haloarchaea genomes for comparisons were obtained from the NCBI genome database according to the following criteria: genomes with optimal or maximum growth temperature information were selected for comparisons of optimal and maximum growth temperature, respectively; genomes of unclassified strains 47 were excluded; and genomes with fewer contigs that are less incomplete were selected, when multiple genomes were available for a single strain.