Ancestral reconstructions decipher major adaptations of ammonia oxidizing archaea upon radiation into moderate terrestrial and marine environments

Unlike all other archaeal lineages, ammonia oxidizing archaea (AOA) of the phylum Thaumarchaea are widespread and abundant in all moderate and oxic environments on Earth. The evolutionary adaptations that led to such unprecedented ecological success of a microbial clade characterized by highly conserved energy and carbon metabolisms have, however, remained underexplored. Here we reconstructed the genomic content and growth temperature of the ancestor of all AOA as well as the ancestors of the marine and soil lineages based on 39 available complete or nearly complete genomes of AOA. Our evolutionary scenario depicts an extremely thermophilic autotrophic, aerobic ancestor from which three independent lineages of a marine and two terrestrial groups radiated into moderate environments. Their emergence was paralleled by (I) a continuous acquisition of an extensive collection of stress tolerance genes mostly involved in redox maintenance and oxygen detoxification, (II) an expansion of regulatory capacities in transcription and central metabolic functions and (III) an extended repertoire of cell appendages and modifications related to adherence and interactions with the environment. Our analysis provides insights into the evolutionary transitions and key processes that enabled the conquest of the diverse environments in which contemporary AOA are found.


55
Several cultivated species of AOA have greatly contributed to a better understanding of their 56 physiology and have revealed that AOA as opposed to their bacterial counterparts are adapted 57 to far lower ammonia concentrations20 which might explain their ecological success in so many 58 oligotrophic environments. However, a recent meta-analysis based on more than 30,000 59 amoA genes deposited in the public databases revealed that the cultivated strains and 60 enrichments represent only 7 out of 19 total AOA clades, indicating that the (eco-)physiological 61 potential of AOA could be far greater than assumed 14. Nevertheless, genomes of cultivated 62 species from diverse environments and a large number of fully or partially assembled (meta-63 )genomes of AOA are now available and reveal a very well-conserved core of energy and 64 carbon metabolism 21,22, with ammonia oxidation catalysed by an ammonia monooxygenase an OGT around 76°C (73°C-78°C), we conclude that the ability to oxidize ammonia for energy 121 acquisition in Archaea was most probably acquired in thermophilic or even hyperthermophilic 122 organisms, as has been suggested since their discovery, and also through subsequent 123 findings of deeply branching thermophilic lineages within Thaumarchaeota and AOA 11-13,24-

139
We infer that at least 1265 protein families were present in LACAOA, the ancestor of AOA.

140
Along the transition from non-AOA to AOA, 289 families were inferred to be gained, and 267   Table S3). Therefore, most 151 of the differences in genome sizes can be attributed to the massive gains that occurred rather  respectively). This parallel increase in genome sizes of the two 'soil' clades led to distinct 156 genomic repertoires and physiologies (see following sections). Our inferences of ancestral 6 genome sizes differed from those recently obtained in 24, resulting in apparent distinct genome 158 dynamics with larger ancestral genomes and more downstream losses in the latter. The Dollo 159 parsimony used in that study to infer gains and losses with Count was earlier deemed 160 inappropriate for studying prokaryotic genomes where lateral gene transfers are common 34.

161
Therefore, this difference in Count settings is likely to account for the differences observed 162 between the studies. Furthermore, the probabilistic setting we used, where gains and losses 163 rates were estimated for each family separately in this study, is likely to be more realistic.

164
In the following sections, we detail the evolutionary path followed by AOA towards adaptations 165 to moderate environments, starting by describing the ancestor of all AOA. In order to confirm 166 the evolutionary scenarios inferred by the Count program we used comprehensive 167 phylogenetic reconstructions for approximately 80 crucial families discussed in the rest of this 168 manuscript (see Table S5). This allowed us to correct -where needed -the evolutionary 169 scenarios inferred by the program Count, and in some cases to make assumptions on the 170 donating lineage (see Materials and Methods, Data S1 and Table S5).

172
The last common ancestor of AOA was a thermophilic, autotrophic aerobic organism 173 Metabolic acquisitions by LACAOA

174
As expected, LACAOA acquired the three genes for the ammonia monooxygenase complex 175 (amoABC) and the fourth candidate subunit (amoX) 35, the urease gene set and two urea 176 transporters (SSS and UT types)12,24,36 and expanded its set of pre-existing ammonia (Amt) 177 transporters (from one to two, as in extant AOA) ( Fig. 3 and Table S4). Although the 178 evolutionary history of the archaeal AMO is not trivial to resolve, recent phylogenetic analyses 179 indicated that it is more closely related to actinobacterial hydrocarbon monooxygenases than 180 the bacterial AMO14. It should also be noted that none of the non-AOA Thaumarchaeal 181 genomes to date encode any genes related to ammonia oxidation or urea utilization. However, 182 one has to note that the pathway has not been fully elucidated yet. Meanwhile, a number of 183 cupredoxin-domain families were acquired, equipping it with the unique copper-reliant 184 biochemistry encountered in contemporary AOA. The heavy reliance on copper necessitated 185 the acquisition of copper uptake systems, such as the CopC/CopD family proteins 37.

186
Curiously, glutamate synthase (GOGAT), which in most Archaea is participating in the high 187 affinity ammonia assimilation pathway, is inferred to be lost at this stage, leaving the glutamate 188 dehydrogenase (usually exhibiting lower affinity 38) as the primary ammonia assimilation route 189 in contemporary AOA. This could be regarded as an adaptation to an energy and carbon-

202
The key enzyme for production of polyhydroxyalcanoates (PHAs), PHA synthase, was also 203 acquired by LACAOA (Data S1), equipping AOA with the ability to produce carbon storage

208
Genes for aerobic autotrophic growth were already present in LACAOA.

209
The gene families inferred to be present already, and not gained by LACAOA (labeled light

393
Interestingly, two non-AOA Thaumarchaeota genome bins from oceans, likely to be from 394 mesophilic organisms as well, also harbour this mitochondrial-like version of the aconitase, 395 while a replacement has taken place in the moderate thermophile N. gargensis (Fig. 5). To

451
AOA Thaumarchaeota) (Fig. S3, Table S4). In addition, moderate habitats are discontinuous 452 and unstable in terms of nutrient availability, hydration content and radiation exposure, in 453 drastically different ways than a hot spring. This is reflected in regulatory adaptations in non-

482
While we agree that oxygen's exposure was a driving evolutionary pressure, we here depict 483 a more complex scenario for adaptation of AOA to moderate environments than described in 484 a recent study 29. We show that adaptation to higher levels of oxygen occurred in addition to 485 adaptations to various types of stress, and that these gradually occurred involving both 486 innovations, but also pre-existing ancestral sets of genes. In contrast to Ren et al. 29 we infer an already aerobic ancestor for all AOA which is in agreement with sister lineages to AOA 488 being at least facultative aerobes (Bathy-, Aig-and non-AOA Thaumarchaeota). This is also 489 in line with a more recent diversification of AOA than suggested by the use of molecular clock   We gathered complete genomes of AOA, and enriched the dataset with complete or 512 metagenome-/single-cell-assembled genomes of high quality. We used CheckM (version 513 1.0.7) 112 to assess the completeness and contamination of the genomes gathered 514 ("lineage_wf" parameter), and used a >80% completeness, <5% contamination threshold to 515 include MAGs or SAGs. Yet these criteria were sometimes relaxed in order to broaden the 516 phylogenetic diversity included (see Table S1). In total, we selected 76 genomes for the 517 analysis, including 39 AOA, 13 non-AOA Thaumarchaeota, 8 Aigarchaeota, 5 Bathyarchaeota 518 and 11 Crenarchaeota (Table S1). When no annotations were available for the selected 519 genomes, the Prokka suite (v1.12)113 was used to annotate genes and proteins.

536
The filtered alignments were then concatenated, and a maximum-likelihood tree was inferred 537 with IQ-Tree (v1.6.11) using the LG+C60+F model of sequence evolution, and 1000 Ultrafast 538 bootstraps were performed 120.

543
Ribosomal protein S17, PF00380 Ribosomal protein S9/S16, PF00410 Ribosomal protein S8, 544 PF00411 Ribosomal protein S11, PF00416 Ribosomal protein S13/S18, PF00466 Ribosomal   Table S2). The stem position predictions were obtained from the RNAStrand 558 database (version 2.0) for all the sequences in our dataset that were available (20 rRNA 559 molecules). The 16S rRNA sequences that were available (56 out of 76 genomes, see Table   560 S1) were all aligned together with RNAStrand sequences with SSU-ALIGN (v0.1.1) 122, and 561 the consensus of the positions found in stems were conservatively selected to be part of stem 562 regions (878 positions). The GC% of the predicted stems was computed for each sequence.

563
A linear regression between the stem GC% and the OGT was computed. A non-homogeneous  (Table S3).  Table S4.

599
Phylogenetic analyses of gained genes 600 We first gathered a representative dataset of 281 genomes from Bacteria, Archaea and 601 Eukaryotes (see Table S1). This dataset, composed of complete genomes (Refseq database, 602 NCBI), was selected to cover a vast range of organismal diversity, and was enriched to cover 603 the diversity of organisms from the TACK super-phylum by adding 68 highly complete 604 metagenomic bins (Table S1).

605
For protein families of interest, we retrieved all the sequences classed as part of the family in 606 our 76 genomes dataset, and used them to query the representative dataset of genomes using 607 Blast (v2.6.0+) 129. The first 250 sequences corresponding to the best-score hits with an e-608 value lower than 10-10 were selected. The corresponding sequences were extracted,

617
alignments and trees were re-constructed without the corresponding sequences. All resulting 618 trees are provided in Data S1, and a list of the generated trees can be found in Table S5.

625
The genomes analyzed in this study are publicly available on NCBI or IMG/JGI, and all 626 accession numbers are given in Table S1. All data generated in this study are provided in the   Table S1). The Crenarchaeota outgroup (11 genomes) is not displayed here, and  Numbers of families inferred as having a probability higher than 0.5 to be present, gained or lost are displayed in circles at each node symbolizing the corresponding ancestral genomes and their dynamics (see main text). Ancestral OGT are displayed at the corresponding nodes (top number) together with their confidence interval (number range below). As there was no 16S rRNA sequences available for the J079 genomic bin, we could not infer OGT for the ancestor of Nitrosocaldales. The color of the bubbles represents the transition from hot (red) to colder (blue) growth temperature. For each candidate order of AOA, the number of protein families found on average in their respective genomes is indicated, as well as the average genome size (along with standard deviations). LACAOA stands for "Last Common Ancestor of AOA", CAMA for "Common Ancestor of Mesophilic AOA" and CARA for "Common Ancestor of Rod-shaped AOA".  (Table S4   A maximum-likelihood phylogenetic tree was obtained for FAM001301. AOA sequences are shown in green, and eukaryotic ones in blue. Grey triangles correspond to collapsed groups of archaeal and bacterial aconitases. In both panels, branches with UF-Boot support above 95% are indicated by a red circle. Branches with UF-Boot support above 95% are indicated by a red circle. See also Fig. S4 for the precise genomic distribution within AOA of these different pili.