Proteogenomic analyses indicate bacterial methylotrophy and archaeal heterotrophy are prevalent below the grass root zone

Annually, half of all plant-derived carbon is added to soil where it is microbially respired to CO2. However, understanding of the microbiology of this process is limited because most culture-independent methods cannot link metabolic processes to the organisms present, and this link to causative agents is necessary to predict the results of perturbations on the system. We collected soil samples at two sub-root depths (10–20 cm and 30–40 cm) before and after a rainfall-driven nutrient perturbation event in a Northern California grassland that experiences a Mediterranean climate. From ten samples, we reconstructed 198 metagenome-assembled genomes that represent all major phylotypes. We also quantified 6,835 proteins and 175 metabolites and showed that after the rain event the concentrations of many sugars and amino acids approach zero at the base of the soil profile. Unexpectedly, the genomes of novel members of the Gemmatimonadetes and Candidate Phylum Rokubacteria phyla encode pathways for methylotrophy. We infer that these abundant organisms contribute substantially to carbon turnover in the soil, given that methylotrophy proteins were among the most abundant proteins in the proteome. Previously undescribed Bathyarchaeota and Thermoplasmatales archaea are abundant in deeper soil horizons and are inferred to contribute appreciably to aromatic amino acid degradation. Many of the other bacteria appear to breakdown other components of plant biomass, as evidenced by the prevalence of various sugar and amino acid transporters and corresponding hydrolyzing machinery in the proteome. Overall, our work provides organism-resolved insight into the spatial distribution of bacteria and archaea whose activities combine to degrade plant-derived organics, limiting the transport of methanol, amino acids and sugars into underlying weathered rock. The new insights into the soil carbon cycle during an intense period of carbon turnover, including biogeochemical roles to previously little known soil microbes, were made possible via the combination of metagenomics, proteomics, and metabolomics.

266 phylogeny was inferred using RAxML (Stamatakis 2014) run using the GTRCAT model of 267 evolution for the 16S rRNA and PROTGAMMLG for the rpS3. The RAxML inference included 268 calculation of 300 bootstrap iterations for the 16S rRNA tree and 100 for the rpS3 tree (MRE-269 based Bootstopping criterion), with 100 randomly sampled to determine support values.
270 Ordination analyses of microbial community structure. Ribosomal protein S3 genes were 271 retrieved using HMMs build from the dataset published in (Hug et al. 2015) and used to search 272 against predicted protein sequences in all samples. Only sequences that spanned at least 60% of 273 the alignment were included and clustered at 99 % similarity (equivalent to species level (Sharon 274 et al. 2015) using Usearch (clusterfast, (Edgar 2010)). Abundances of each cluster in each 275 sample were determined from scaffold coverage (see above) and normalized to percent 276 abundances in each sample. Principle coordinate analysis (PCoA) based on Bray-Curtis distance 277 measure was computed using the R programming environment (R-core team, 2015) in 278 conjunction with the vegan package (Dixon 2003 Between 14 and 25 Gb of DNA sequence data were obtained per soil sample for the ten   290 soil samples, two of which were time and depth replicates. The 250 bp reads were assembled as   291 detailed in the Methods section, resulting in 2,982,775 contigs > 1000 bp in length (42,770 292 contigs > 10,000 bp, the longest was 538,000 bp). In total, these contigs encode 8,773,880 genes 293 (Table S1). For each sample, the reads were then mapped back to the assembled contigs of the 294 sample to generate coverage statistics. 295 We compared the microbial community structure across the sample series using coverage 296 of scaffolds assembled from the sequence datasets. Out of the 1420 microorganisms detected 297 based on marker gene (rpS3) sequences, 652 occurred in 2 to 18 times (average: 4.2 ± 2.86) over 298 the ten samples and had relative abundances of between 0.06 and 3.2 % of the community ( Fig.   299 1). The same information was used in an ordination analysis and showed that samples from the 300 same depth were more similar to each other than to those from the other depth with the exception 301 of the first post-rain 10 -20 cm sample (Fig. 1). The results indicate substantial overlap in the 302 organisms present, especially among samples from the same soil depth. The same organisms 303 were also observed in soil samples collected at the same depth and time from sites separated by 304~10 m. In fact, many of these organisms are highly abundant, ranking in the top ten most 305 abundant organisms and amounting for > 1 % relative abundance in each sample (Fig. 1). The 306 availability of sequence information for multiple independently assembled samples with 307 substantial overlap in community membership enabled the addition of series abundance 308 parameterization to nucleotide frequency-based genome binning. We generated 198 bins, 46 of 309 which were classified as metagenome-assembled draft genomes. Most sequence fragments that 310 could not be assigned to specific genomes were grouped at the phylum-level and assigned to  Table S2b). In addition, we generated four draft genomes from the "Miscellaneous 324 Crenarchaeota Group" (MCG), two from the "Soil Crenarchaeota Group" (SCG) and three from 325 the "South African Gold Mine Miscellaneous Group" (SAGMCG). The SCG and SAGMCG are 326 likely within the Thaumarchaeotes whereas the MCG are novel Bathyarchaeota (Figs. S2a,b). 327 We also sampled Euryarchaeotes from the Thermoplasmatales lineage, but the genome bins were 328 not well resolved.

329
Phylogenetic trees constructed using marker genes for both bacteria and the archaea from 330 samples collected from different depths and times display structures similar to the seed puff of a 331 dandelion, with many closely related strains at the termini of most branches ( Fig. 2 and Fig. S2). Manuscript to be reviewed 334 ribosomal protein S3 phylogenetic tree (Fig. 2) exhibits fine scale diversity that is comparable to 335 the observed level of diversity detected in every phylum. Many organisms of the same type 336 (separated by zero branch length in Fig. 2) occurred in samples collected at different times and 337 from both depths.

338
Before the rain, the most abundant organisms in the shallow depth interval (10 -20 cm) 339 are Gemmatimonadetes and Actinobacteria species. Gemmatimonadetes species are also 340 abundant after the rain event. Thermoplasmatales are highly abundant in samples collected after 341 the rain event (Fig. 1).

342
In general, the microbial community composition of 30 -40 cm depth samples differed 343 from that of the 10 -20 cm depth samples and samples collected before and after the rain event 344 were less different than those collected from the shallower soil. Notably, the community sampled 345 from deeper soil prior to the rain event was dominated by a Rokubacterium and several 346 Thermoplasmatales (Euryarchaeota) were abundant. In fact, around 20% of the microorganisms 347 sampled in the deeper soil interval were archaea, an interesting finding because archaea are 348 generally believed to be relatively rare compared to bacteria in soil (Fierer et al. 2012a). In 349 samples collected after the rain, three Thermoplasmatales archaea are highly abundant and rank 350 in the top 10 most abundant organisms in the deeper soil (Fig. 1).

351
We obtained proteomic information to provide insight into the active pathways that 352 mediated carbon and nitrogen compound transformations. Between 2,881 and 4,716 353 proteins/protein groups were identified per soil sample (Table S3a) 387 Furthermore, they encode the PQQ biosynthesis machinery. The THMPT biosynthesis 388 machinery is encoded immediately upstream of the PQQ MDH gene (Fig. S3) (Scott & Rasche 389 2002). The results strongly support the capacity for methylotrophy in these soil-associated 390 Gemmatimonadetes and functioning of this pathway ( Fig. 4 and Fig. S4).
400 Highly represented were carbohydrate-active enzymes such as glycoside hydrolases, 401 polysaccharide lyases, and many sugar and amino acid transport proteins (Table S3a)