Genome-resolved metagenomics reveals novel archaeal and bacterial genomes from Amazonian forest and pasture soils

Amazonian soil microbial communities are known to be affected by the forest-to-pasture conversion, but the identity and metabolic potential of most of their organisms remain poorly characterized. To contribute to the understanding of these communities, here we describe metagenome-assembled genomes (MAGs) recovered from 12 forest and pasture soil metagenomes of the Brazilian Eastern Amazon. We obtained 11 forest and 30 pasture MAGs (≥50% of completeness and ≤10 % of contamination), distributed among two archaeal and 11 bacterial phyla. The taxonomic classification results suggest that most MAGs may represent potential novel microbial taxa. MAGs selected for further evaluation included members of Acidobacteriota , Actinobacteriota , Desulfobacterota_B, Desulfobacterota_F, Dormibacterota, Eremiobacterota , Halobacteriota, Proteobacteria , and Thermoproteota , thus revealing their roles in carbohydrate degradation and mercury detoxification as well as in the sulphur, nitrogen, and methane cycles. A methane-producing Archaea of the genus Methanosarcina was almost exclusively recovered from pasture soils, which can be linked to a sink-to-source shift after the forest-to-pasture conversion. The novel MAGs constitute an important resource to help us unravel the yet-unknown microbial diversity in Amazonian soils and its functional potential and, consequently, the responses of these microorganisms to land-use change.


INTRODUCTION
Soil microorganisms play crucial roles in below-and above-ground ecosystems, including the formation and stabilization of soil aggregates, carbon storage, organic matter decomposition, nutrient cycling, soil fertility, plant growth and health [1,2], and even the production and consumption of greenhouse gases, such as methane (CH 4 ) and nitrous oxide (N 2 O) [3,4]. Soil microbes are also considered important components of soil health and have been used as bioindicators in soil-health evaluations [2,5]. However, despite their ecological and economic impacts, assessments on biodiversity have neglected soil macro-and microorganisms [6].

ACCESS
From both diversity and functional perspectives, soils from tropical and subtropical regions are even less studied [6,7], including the Amazon rainforest.
The Amazon is one of the most important reservoirs of biodiversity on Earth [8]. Due to the increase in deforestation in recent years [9] and the international concern regarding the future of this rainforest, several studies have been carried out to examine the effects of forest clearing and conversion on its soil physical, chemical, and biological attributes. These studies have revealed that the forest-to-pasture conversion alters the abundance, taxonomic and functional profiles of soil microbial communities [e.g. 10,11,12], therefore impacting several environmental processes, such as the soil CH 4 cycling and fluxes [e.g. 13,14,15]. Nevertheless, to date, the identity and metabolic potential of a considerable fraction of the Amazonian soil microbial communities remain unknown, which limits our understanding of land-use impacts on these organisms and the biogeochemical cycles they drive.
In this context, genome-resolved metagenomics can be used to assemble overlapping short reads into longer contiguous sequences (contigs) and group them (binning) into putative metagenome-assembled genomes (MAGs) [16]. MAGs can then be taxonomically classified and functionally annotated using curated databases, helping us identify and understand the potential roles of yet-to-be cultivable microorganisms in the environment. This approach has expanded the known microbial phylogenetic diversity [e.g. 17,18,19], thus rapidly transforming the field of microbiology. Nevertheless, only a small number of studies have used this method so far to recover archaeal and bacterial MAGs from Amazonian soils [20,21].
In this study, we used genome-resolved metagenomics to assemble and recover archaeal and bacterial MAGs from forest and pasture soils of the Brazilian Eastern Amazon. This approach was carried out using shotgun metagenomic sequencing data from a microcosm experiment in which soil moisture levels were increased. This experiment was previously conducted to evaluate the combined effects of forest-to-pasture conversion and increased moisture on soil CH 4 microbial communities [15]. The genomes described here provide an important resource for the characterization of the microbial communities in Amazonian soils and, consequently, for our understanding of their responses to land-use changes and other environmental disturbances.

Site description, soil sampling, and microcosm experiment
The soil sampling was carried out in July 2015 in a pristine forest of the Tapajós National Forest (3°17'44.4"S 54°57'46.7"W) and an active cattle pasture (3°18'46.7"S 54°54'34.8"W), in the state of Pará, in the Brazilian Eastern Amazon. Following the removal of the litter layer, soil samples from 0 to 10 cm depth were collected in three sampling points per site, each separated by 50 m. Samples from each land-use treatment were combined, sieved through a 5 mm mesh sieve, and subjected to a microcosm experiment under increasing soil moisture levels. The microcosms were maintained and monitored for 30 days at 25 °C in a Biochemical Oxygen Demand incubator. Moisture treatments were established in triplicate for each land use using 1.5 litre glass jars filled with 350 g of soil each. These treatments included the original soil gravimetric moisture of each site (22 % for forest and 24 % for pasture) and 100 % of gravimetric moisture at field capacity (50 % for forest and 63 % for pasture). Soil samples from each microcosm were frozen in liquid nitrogen at the end of the experiment and stored at −80 °C.

DNA extraction, quantification, and sequencing
Forest and pasture soil samples under original soil moisture and at 100 % field capacity were DNA-extracted in duplicate using the PowerLyzer PowerSoil DNA Isolation Kit (QIAGEN, Hilden, North Rhine-Westphalia, Germany), totaling 12 DNA samples, following a protocol optimized for Amazon soils [22]. DNA samples were checked using 1 % agarose gel electrophoresis and a Nanodrop 2000c spectrophotometer (Thermo Fisher Scientific, Inc., Waltham, MA, USA) and stored at −20 °C. Paired-end shotgun metagenomic sequencing (2×150 bp) was performed on an Illumina HiSeq platform (Illumina, Inc., San Diego, CA, USA) at Novogene Co., Ltd. (Beijing, China), using the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, Inc., Ipswich, MA, USA) for library construction. Detailed information about the sites, their soil physicochemical properties, the design of the microcosm experiment, and DNA extraction and sequencing was previously reported [15,21].

Impact Statement
The soil microbial communities of the Amazon rainforest have been evaluated in the context of deforestation and land-use change, but their diversity remains largely unknown. In this paper, 41 metagenome-assembled genomes (MAGs) (≥50 % of completeness and ≤10 % of contamination) were recovered from forest and pasture soils and characterized. The MAGs were spread over 11 bacterial and two archaeal phyla, 90% and 29 % of which could not be assigned to any known species and genus, respectively. Gene annotations indicated their potential roles in biogeochemical cycling, mercury detoxification, and the degradation of complex carbohydrates, revealing distinct functional patterns between forest and pasture soil microbial communities.

Recovery and characterization of MAGs
The bioinformatics analysis was performed on the KBase platform [23]. Metagenomic sequences were uploaded to the platform and imported into a narrative as paired-end reads using the KBase apps Upload File to Staging from Web v1.0.12 and Import FASTQ/SRA File as Reads from Staging Area [23], respectively. Paired-end reads were quality-checked with FastQC v0.11.5 [24] and, outside KBase [23], with MultiQC [25]. Based on the results, reads were cleaned from adaptors, trimmed, and filtered using Trimmomatic v0.36 (altered parameters: adapters, TruSeq3-PE-2; sliding window minimum quality, 20; head crop length, 10; minimum read length, 50) [26]. The remaining paired-end reads from each land use (regardless of the soil moisture treatment) were again quality-checked with FastQC [24] and MultiQC [25] and merged into one object using the KBase app Merge Reads Libraries v1.0.1 [23].
Bins with the values of completeness (>90 %) and contamination (<5 %) for high-quality drafts of the Minimum Information about a Metagenome-Assembled Genome (MIMAG) standards [35] were selected for further analysis and extracted as assemblies using the KBase app Extract Bins as Assemblies from BinnedContigs v1.0.2 [23]. Two pasture MAGs (Bin.006_Pasture of the family Binataceae and Bin.035_Pasture of the genus Methanosarcina) that did not meet these criteria were also selected due to the relevance of both groups in the CH 4 cycle [36,37], totaling 12 MAGs selected for additional exploration. The relative abundance of each selected bin in both merged forest and pasture metagenomes (defined as the number of mapped reads divided by the number of reads in the metagenome) was estimated using Bowtie2 v2.3.2 (altered parameter: alignment type preset options, very-sensitive) [38].
A set with all the selected bins was created using the KBase app Build AssemblySet v1.0.1 [23]. Bins were annotated by the beta app Annotate and Distill Assemblies with DRAM (Distilled and Refined Annotation of Metabolism) [39]. DRAM provides annotations of MAGs using multiple databases and then summarizes the results to facilitate the exploration of their functional and structural traits, also using functional marker genes to infer metabolic descriptors of MAGs [39]. More detailed information about the functioning of DRAM can be found here: https://github.com/WrightonLabCSU/DRAM/wiki/1.-How-DRAM-Works. The presence of 23S, 16S, and 5S ribosomal RNA (rRNA) genes and transfer RNAs (tRNAs) in each MAG was also checked by DRAM. Figures were generated using the packages ggplot2 3.3.5 [40] and ggalluvial 0.12.3 [41] in R 4.0.2 [42].

Sequencing and co-assembly statistics
The shotgun metagenomic sequencing of the 12 DNA samples from forest and pasture soils resulted in 594.9 million paired-end reads of 150 bp in length, with an average of 49.6 million per sample, ranging from 39.9 to 79.9 million across samples. After our quality control, 529.9 million paired-end reads from 50 to 140 bp in length were kept, with an average of 44.2 million per sample, ranging from 34.8 to 72.9 million. After merging the reads per land use, the forest and pasture libraries had 273.3 and 256.6 million paired-end reads, respectively.
The co-assembly of the forest paired-end reads generated a total of 367 892 contigs (374 273 bp in the largest contig), with N50 of 3 480 bp and L50 of 106 965 bp, while for pasture, it produced 366 749 contigs (515 562 bp in the largest contig), with N50 of 4 207 bp and L50 of 89 012 bp. A total of 19 forest and 39 pasture bins resulted from DAS Tool [31], in which 11 and 30, respectively, passed our quality filter (≥50 % of completeness and ≤10 % of contamination) ( Fig. 1 and Table S1, available in the online version of this article).

Taxonomic assignment and relative abundance of MAGs
Regarding the taxonomic classification of the bins, 39 belonged to 11 bacterial phyla: Actinobacteriota, Proteobacteria, Acidobacteriota, Patescibacteria, Chloroflexota, Desulfobacterota_F, Dormibacterota, Eremiobacterota, Planctomycetota, Desulfobacterota_B, and Verrucomicrobiota ( Fig. 1 and Table S1, available in the online version of this article). We also recovered archaeal bins from the phyla Halobacteriota and Thermoproteota. Only four bins could be classified at the species level. In fact, 10 bacterial bins could not be assigned at the genus level and two at the family level, thus demonstrating the potential of this approach to reveal the yet-unknown microbial diversity of Amazonian soils. It is important to mention that, using other bioinformatics tools, Lemos et al. [21] previously recovered and characterized two patescibacterial MAGs from our high-moisture pasture soils, and additional tests based on the average nucleotide identity (ANI) indicated that two of our genomes belong to those species (99.9 % for Bin.036_Pasture and WARW01 000 000 (used as reference and available at https://www.ncbi.nlm.nih.gov/nuccore/ WARW00000000.1/), and 99.9 % for Bin.038_Pasture and WARV01000000 (used as reference and available at https://www.ncbi. nlm.nih.gov/nuccore/WARV00000000)).
Despite their near-completeness (>90 % of completeness and <5 % of contamination), our selected bins did not meet all MIMAG standards for high-quality drafts due to the absence of certain rRNA genes or tRNAs [35] (Table 1). However, four MAGs possessed the 16S rRNA gene; nine, the 5S rRNA gene; and nine, tRNAs for at least 18 amino acids. No sequences of the 23S rRNA gene could be found in the MAGs.

Functional characterization of MAGs and biogeochemical relevance
Genes related to glycolysis (Embden-Meyerhof pathway), pentose phosphate pathway (pentose phosphate cycle), citrate (TCA or Krebs cycle), glyoxylate, reductive pentose phosphate (Calvin cycle), reductive citrate (Arnon-Buchanan cycle), and  MAGs selected based on their completeness (>90 %) and contamination (<5 %) scores. Bin.006_Pasture and Bin.035_Pasture were also included due to their environmental relevance. Table 1. Continued dicarboxylate-hydroxybutyrate cycles were found in all selected MAGs (Fig. 3). Some of them featured the full pathways for relevant metabolisms, including glycolysis, pentose phosphate, and Entner-Doudoroff pathways -this alternative pathway to glycolysis is most common in Gram-negative bacteria [43], and it was detected in the proteobacterial MAG from the family Burkholderiaceae (Bin.013_Forest, genus Paraburkholderia) -as well as citrate and glyoxylate cycles. The latter allows organisms  to grow on acetate or fatty acids as sole carbon sources [44]. Several electron transport chain complexes (I -V), associated with aerobic respiration, have also been found to be fully covered in the MAGs. No complete carbon fixation pathways were annotated in our genomes.
Previous studies have reported a CH 4 sink-to-source shift after forest-to-pasture conversion in the Amazon [13,15,[45][46][47][48], also revealing that pasture soils usually harbour a higher abundance of CH 4 -producing archaea (methanogens) than forest soils [13][14][15]. Furthermore, using our microcosm experiment dataset, we demonstrated that increased soil moisture intensified soil CH 4 emissions and related microbial responses driven by forest-to-pasture conversion [15]. In consequence, here, we were able to recover a MAG of the genus Methanosarcina from the pasture (Bin.035_Pasture), a group of strictly anaerobic CH 4 -producing archaea [36], which was much more abundant in our pasture soils than in the forest (Fig. 2). As expected, the methanogenesis pathway was fully detected in this novel genome.
The degradation of organic molecules by soil microbial communities is a crucial step in the carbon cycle [49]. We further investigated the presence of carbohydrate-active enzymes (CAZymes) genes in the MAGs, revealing genes related to the cleavage of polyphenolics and complex carbohydrates, such as chitin, amorphous cellulose, xylans, mixed-linkage glucans, and starch (Fig. 4). The pasture MAGs from the classes Acidimicrobiia (Bin.002_Pasture), Acidobacteriae (Bin.001_Pasture), Dormibacteria (Bin.020_Pasture), and Eremiobacteria (Bin.005_Pasture) were found to have the potential to degrade the highest number of substrates. In fact, Silva-Olaya et al. [50] suggested a higher mineralization potential by the soil microbiota in pastures compared to forest soils of the Colombian Amazon. Understanding the different microbial strategies to convert biomass in Amazonian soils is essential to unveil their potential ecosystem services in these environments.
Regarding other fundamental biogeochemical cycles, sulphur is considered one of the most important elements for life, and its related microbial oxidation and reduction processes occur in several ecosystems [51]. Genes associated with the thiosulphate reduction to sulphite (rdlA gene) [52] and thiosulphate oxidation to sulphate through the sulphur oxidation (Sox) enzyme system (soxXYZABCD genes) [53,54] have also been found in some MAGs (Fig. 4). The alphaproteobacterial MAGs of the Hyphomicrobiaceae and Acetobacteraceae families possess five and six Sox genes, respectively (soxA, soxB, soxX, soxY, and soxZ in Bin.027_Pasture; and soxA, soxC, soxD, soxX, soxY, and soxZ in Bin.031_Pasture).
Genes related to nitrogen-transforming processes -such as nitrogen fixation, nitrification, denitrification, and dissimilatory nitrate reduction to ammonium (DNRA) -could also be detected in the MAGs, including nifD, nifH, and nifK for nitrogenase in Bin.003_ Forest and Bin.035_Pasture; narK/nrtP/nasA for nitrate/nitrite transporter, narG/narZ/nxrA and narH/narY/nxrB for nitrate reductase/nitrite oxidoreductase, and narI/narV for nitrate reductase in Bin.004_Forest, Bin.013_Forest, and Bin.031_Pasture; nasA for assimilatory nitrate reductase in Bin.013_Forest; nrfA and nrfH for cytochrome c nitrite reductase in Bin.003_Forest; nirB and nirD for nitrite reductase (NADH) in Bin.013_Forest; hao for hydroxylamine dehydrogenase (HAO) in Bin.003_Forest and Bin.001_Pasture; and nirK for nitrite reductase (NO-forming) in Bin.013_Forest, Bin.006_Pasture, and Bin.034_Pasture. It is worth mentioning that the nirK gene of Bin.034_pasture may be related to nitrification, as it has been reported that this gene may oxidize hydroxylamine to N 2 O in ammonia-oxidizing archaea, functioning as a bacteria-like HAO [55,56].
The nitric oxide reductase, responsible for the microbial reduction of nitric oxide (NO) to N 2 O -the main source of this greenhouse gas [57], was detected in three bacterial MAGs from the families Geobacteraceae (Bin.003_Forest with norB), Burkholderiaceae (Bin.013_Forest with norB and norC), and Acetobacteraceae (Bin.031_Pasture with norB). On the other hand, a nitrous oxide reductase (nosZ) that reduces N 2 O to dinitrogen [4] is encoded by the Burkholderiaceae MAG. In previous studies in the Amazon region, this important gene for N 2 O consumption was found in higher abundance in forests in comparison with pasture soils [48,58].
Methane/ammonia monooxygenase genes were detected in our archaea from the Nitrososphaeraceae family (Bin.034_Pasture with pmoA-amoA and pmoB-amoB). Members of this family are capable of oxidizing ammonia, with a few soil isolates [59][60][61][62][63]. The pasture MAG from the class Binatia (Bin.006_Pasture, order Binatales) also contains pmoA-amoA, pmoB-amoB, and pmoC-amoC. Binatota is a yet-uncultured, poorly characterized candidate phylum, but some of its members have been recently suggested to be involved in CH 4 oxidation [37]. This recent study revealed that 11 MAGs -of the orders Bin18 and Binatales -from a total of 108 encode copper membrane monooxygenases (CuMMOs), an enzyme family that includes the particulate methane monooxygenase (pMMO) [37]. Therefore, these microorganisms, not yet considered in studies on the Amazonian soil CH 4 cycle but more abundant in our pasture soils, may potentially be related to the consumption of this greenhouse gas.
Amazonian soils are naturally rich in mercury [66] and, along with numerous other relevant functions observed in the genomes (Fig. 4), the forest MAG from the Geobacteraceae family of Desulfobacterota_F (Bin.003_Forest) also encodes a mercuric reductase, related to mercury detoxification, an important feature for the bioremediation of contaminated environments [67]. Furthermore, genes associated with acetate metabolism -a short-chain fatty acid used as an energy and carbon source for several microorganisms [68], including certain Methanosarcina species [36] -were also present in the MAGs (most notably, acs for acetyl-CoA synthetase, ackA for acetate kinase, and pta for phosphate acetyltransferase in Bin.003_Forest, Bin.013_Forest, and Bin.031_Pasture, and ACH1 for acetyl-CoA hydrolase in Bin.003_Forest).

CONCLUSION
In conclusion, genome-resolved metagenomics revealed potentially novel genomes from forest and pasture soils of the Eastern Amazon. This approach can expand our knowledge about the microbial communities from Amazonian soils and reveal the functional potential of novel or underrepresented microbes, thus helping us to understand their ecological roles in this environment. Considering the relationship of these genomes with critical and closely linked biogeochemical cycles, our results also constitute an important resource for further studies on the functional responses of Amazonian soil microbial communities in light of land-use and climate change.

Conflicts of interest
The authors declare that there are no conflicts of interest.