Competitive exclusion and metabolic dependency among microorganisms structure the cellulose economy of agricultural soil

Background Microorganisms that degrade cellulose utilize extracellular processes that yield free intermediates which promote interactions with non-cellulolytic organisms. We hypothesized that these interactions determine the ecological and physiological traits governing the fate of cellulosic carbon (C) in soil. We employed metagenomic-SIP and metaproteomics to characterize the attributes of cellulolytic and non-cellulolytic microbes accessing 13C from cellulose. We hypothesized that cellulolytic taxa would exhibit competitive traits to limit access, while non-cellulolytic taxa would display metabolic dependency, such as signatures of adaptive gene loss. We tested this hypothesis by evaluating genomic traits indicative of competitive exclusion or metabolic dependency, such as antibiotic production, growth rate, surface attachment, biomass degrading potential and auxotrophy. Results The most 13C-enriched taxa were cellulolytic Cellvibrio (Gammaproteobacteria) and Chaetomium (Ascomycota), which exhibited a strategy of self-suciency (prototrophy), rapid growth, and competitive exclusion via antibiotic production. These ruderal taxa were common indicators of soil disturbance in agroecosystems, such as tillage and fertilization. Auxotrophy was more prevalent in cellulolytic Actinobacteria than in cellulolytic Proteobacteria, demonstrating differences in dependency among cellulose degraders. Non-cellulolytic taxa that accessed 13C from cellulose (Planctomycetales, Verrucomicrobia and Vampirovibrionales) were highly dependent, as indicated by patterns of auxotrophy and 13C-labeling (i.e. partial labelling or labeling at later-stages). Major 13C-labeled cellulolytic microbes (e.g. Sorangium, Actinomycetales, Rhizobiales and Caulobacteraceae) possessed adaptations for surface colonization (e.g. gliding motility, hyphae, attachment structures) signifying the importance of surface ecology in decomposition. Conclusions Our results demonstrate that access to cellulose was accompanied by ecological trade-offs characterized by differing degrees of metabolic dependency and competitive exclusion. These trade-offs in�uence microbial growth dynamics on particulate organic carbon and reveal that the fate of carbon is governed by a complex economy within the microbial community. We propose three ecological groups for microbes participating in this economy: (i) independent primary degraders, (ii) integrated primary degraders and (iii) mutualists, opportunists and parasites


Background
The major structural component of plant biomass, cellulose, is degraded in soil by a diverse and interacting community of microorganisms [1].Since cellulose is insoluble and highly crystalline, it cannot be transported across cell membranes.Hence, microorganisms rely on extracellular reactions to digest cellulose bers into oligodextrins, cellobiose, and glucose for transport into the cell.Due to the structural complexity of lignocellulose, cellulose degradation is facilitated by synergistic interactions between diverse enzyme systems that differ in speci c activity [2,3].Various physiological traits also in uence the colonization and deconstruction of cellulose bers, such as hyphal growth by members of fungi and Actinobacteria [4,5], gliding motility in Bacteroidetes [6], and the formation of cellulosomes by many anaerobes [7,8].These circumstances suggest that cellulose degradation in soil is predisposed to metabolic and ecological interactions that govern access to the soluble products of cellulose degradation.However, most approaches to studying cellulose decomposition have overlooked the ecological interactions occurring within microbial consortia, since most cellulolytic microorganisms have been characterized in isolation or in co-culture [9][10][11][12][13][14]. Phylogenetic and functional gene-based metagenomics coupled with DNA stable isotope probing (SIP) now provides the capability to study the ecophysiological traits of the diverse microorganisms participating in the cellulose economy as it occurs in soil.
The extracellular nature of cellulose degradation creates conditions where the tness of individuals is contingent on both competition and facilitation.Competition for cellulose and its degradation products impose tness costs on cellulolytic organisms, promoting antagonistic interactions [15][16][17].However, facilitation by commensal and mutualistic partners enhance degradation rates to the bene t of cellulolytic organisms [18].Many cellulolytic microbes have close relatives lacking in endoglucanases, suggesting adaptive bene ts from the gain or loss of these genes [19].The bene ciaries of community metabolism should be expected to shed energetically costly traits, resulting in adaptive gene loss and evolution of metabolic dependency [20].For example, non-cellulolytic bacteria can complement the metabolic functions of cellulolytic bacteria in vitro, through catabolic reactions [21,22], vitamin biosynthesis [23], amino acid biosynthesis [24], or biosynthesis of other essential metabolites [25].Such metabolic dependency can occur through speci c, tightly coupled interactions, such as those of syntrophic partners, but it can also be loosely coupled and non-speci c, such as when cells obtain metabolites or other nutrients from extracellular pools replenished by mortality of community members [26,27].Hence, we expect that the anabolic and catabolic byproducts of cellulolytic microbes in uence overall community structure and function and thereby govern the fate of cellulosic carbon in soil.
Shotgun metagenomics and the recovery of metagenome-assembled genomes (MAGs) provides a cultivation-independent means of studying the phylogenetic and functional characteristics of microbial communities.This approach has been used to identify ecophysiological traits [28,29] and study metabolic dependency in environmental populations [30].With DNA-SIP, one can identify MAGs from organisms that assimilate 13 C, either directly or indirectly, from 13 C-labeled substrates by separating and sequencing the 13 C-enriched DNA (Figure S1).Metagenomic-SIP proved effective in resolving traits of cellulolytic and lignolytic populations in forest soil, in part, by improving MAG recovery [14].DNA-SIP can be used to estimate the degree of 13 C-labeling of individual MAGs by measuring the change in buoyant density across the CsCl gradient [31,32].This approach offers the capacity to distinguish between highly 13 C-enriched DNA corresponding to taxa with primary access to cellulose-C, less 13 C-enriched DNA corresponding to microbes with peripheral access to cellulose-C, and unenriched DNA derived from the broader soil community.The genomic information from these groups can then be used to study differences in ecological traits, such as metabolic dependency or antibiotic production, encoded by members of the cellulose economy.
We performed gradient-resolved metagenomic-SIP on 13 C-labelled DNA from an agricultural soil following a 30-day incubation with 13 C-labeled cellulose.This experiment was designed to delineate membership in a cellulose degrading soil consortium.We hypothesized that 13 C-enriched, cellulolytic taxa would display diverse ecological strategies.We expected early cellulose colonists to depend less on the products of community metabolism than later colonists, which will be evident in varying degrees of auxotrophy.We expected that cellulolytic microbes would be enriched in secondary metabolite gene clusters (SMs), such as those that synthesize antibiotics, to control access to community resources.We further hypothesized that 13 C-enriched, non-cellulolytic microbes would exhibit signatures of metabolic dependency based on the degree of auxotrophy and/or capacity for degrading microbial necromass (i.e.numbers of genes encoding nucleases, peptidases, chitinases, and other hydrolytic enzymes).Additionally, we identi ed traits related to surface colonization that we anticipated to be important features of community interactions on insoluble cellulosic bres.We validated our results with reference genomes and performed metaproteomics to con rm gene expression by target groups.

Results
Overview of Metagenome Assembly and Designation of 13 C-Enrichment Shotgun metagenomic sequencing of 13 C-labeled DNA recovered a total of 1.1 billion reads after quality ltering, estimated by nonpareil [74] to cover 80% of genomic diversity in the DNA pool.Metagenome assembly produced a total of 356,131 contigs greater than 2.5 Kb, amounting to a total length of 1.8 Gb (~ 230 genomes of 8 Mb).The degree of 13 C-enrichment was estimated for each contig by comparing the CsCl gradient pro le to simulated natural abundance pro les (Figure S2).More than half of all contigs were designated either strongly or weakly 13 C-enriched (total length = 921 Mb), while the remainder were from genomes of abundant soil taxa that lacked evidence of 13 C-labelling (766 Mb).Contigs clustered into coherent sets according to pentanucleotide frequency which grouped by patterns of 13 C-labelling and taxonomy (Figure 1ab), and not average G+C content (Figure 1c).The Random Forest model used to predict enrichment status had an overall accuracy of 89.1% with high sensitivity and speci city for both strongly enriched (98.3% and 86.4%, respectively) and unenriched contigs (100% and 93%) (details in Supplementary methods; Table S4).

Phylobin Characteristics
A total of 47, 2 and 46 phylobins greater than 1 Mb were produced from strongly, weakly and unenriched contig sets, respectively (Table S5).Of the 95 total phylobins, 38 were deemed high quality (>75% completeness) and were divided into four categories based on enrichment status and cellulolytic potential (inferred from the presence of endoglucanases): strongly 13 C-enriched and cellulolytic (n strong = 12), strongly 13 C-enriched and non-cellulolytic (n strong = 8), weakly 13 C-enriched and non-cellulolytic (n weak = 2), and unenriched (n = 16).Each phylobin represented a genomic-ecological unit, rather than an individual genome, encompassing genomes from, at most, four to seven genera for enriched and unenriched phylobins, respectively, based on the diversity of assembled full-length 16S rRNA genes (Figure 2).Both phylobins and representative reference genomes from 13 C-enriched cellulolytic taxa were larger (µ PhyBin = 24.7 Mb and µ Rep = 7.1 Mb) than those from 13 C-enriched non-cellulolytic taxa (µ PhyBin = 12.5 Mb and µ Rep = 5.2 Mb; Wilcoxon test, p ≤ 0.05) and unenriched taxa (µ PhyBin = 11.4Mb and µ Rep = 5.0 Mb; p < 0.01).Analysis of single-copy genes and single nucleotide polymorphisms per single-copy gene indicated that the large size of phylobins resulted from natural pangenomic diversity (intra-species) and inclusion of genome fragments from closely related taxa (inter-species diversity; details in Supplementary Methods).
The Structure of a Cellulose Economy Taxa designated as strongly 13 C-enriched and cellulolytic (i.e.encoding endoglucanases) represented the greatest proportion of unassembled SSU gene fragments in metagenomes.The most abundant were classi ed to well-known genera of cellulolytic soil organisms, including Cellvibrio, Herpetosiphon (Chloro exi), and members of the fungal order Sordariales (predominantly Chaetomium), as well as lesser-known cellulolytic genera, such as Devosia and Sphingomonas (Figure 2).Peptides from these ve taxa were also abundant within the total metaproteome (n total = 90,557 peptides; 33,765 unique proteins) occupying in the following percentages of total peptides: Rhizobiales (6.9%), Cellvibrionales (2.6%), Sphingomonadales (2.6%), Herpetosiphon (1.0%), and Sordariales (0.5%).Endoglucanases were detected in contigs classi ed to 22 of the 30 genera designated as strongly 13 C-enriched, consistent with their presence in reference genomes (Figure 2, Table S6).The 22 genera designated strongly 13 C-enriched and cellulolytic comprised 29% of the total SSU rRNA gene fragments.In contrast, the seven genera designated as strongly 13 C-enriched and non-cellulolytic (see Figure 2) comprised 2.3% of recovered SSU rRNA reads.These putative non-cellulolytic taxa included only one reference genome encoding an endoglucanase.
A diverse set of endoglucanases were recovered from phylobins, revealing a snapshot of the functional diversity of cellulolytic populations.A total of 426 unique endoglucanase homologs were identi ed (at a > 80% identity threshold) belonging to 39 different CAZy families/sub-families.Eighty-two of these endoglucanases were present within gene clusters that contained a carbohydrate-binding module.A total of 54 peptides in the metaproteome matched endoglucanases; the most abundant was a GH9 from Cellvibrio (Figure 3).The second and third most abundant endoglucanases in the metaproteome were encoded by fungi (GH131 and GH7).Overall, most endoglucanases in the metaproteome were encoded by fungi (57%), which was disproportionate to the total relative abundance of fungal peptides (1.2%) in the whole metaproteome.

Evidence for Metabolic Dependency and Competitive Exclusion
To evaluate potential interactions among 13 C-labeled taxa, we assessed the degree of auxotrophy (as an indicator of metabolic dependency) and presence of SM-encoding genes (antibiotic-based competition) in phylobins and their representative genomes.No phylobin or genome was fully prototrophic or auxotrophic for all biosynthetic pathways evaluated (n = 32), with the average phylobin being auxotrophic for 6 amino acids, 1 cofactor and 2 vitamins and the average representative genome auxotrophic for 5 amino acids, 1 cofactor and 1 vitamin.The extent of auxotrophy did not differ signi cantly between 13 C-enriched cellulolytic and 13 C-enriched non-cellulolytic phylobins or representative genomes (Figure 4) but did vary among dominant taxa within each group.
The orders Planctomycetales and Sphingomonadales were represented by independent phylobins that were weakly 13 C-enriched, alongside those that were strongly 13 C-enriched and unenriched (Figure 2).

Comparison of Cellulolytic and Hydrolytic Potential
The functional gene content of representative genomes explained substantial variation in enrichment status (Figure 5).The trend was driven primarily by the relative abundance of glycosyl hydrolases (GH), which were 1.5-to 3-fold higher (after normalization for genome size) in 13 C-enriched cellulolytic phylobins and corresponding reference genomes, respectively.This trend was evident in all gene families associated with lignocellulose degradation (GH, CBMs, AA and PL), which collectively explained 63% of variation in community functional composition along NMDS1 (Figure 5a; Table S8b).The genomes also separated along the secondary axis (NMDS2) de ned by genome size, and peptidase and motility gene content, which explained 16.3, 16%, and 19% of variation, respectively (Table S8b).Degree of auxotrophy did not correlate with variation in functional gene content in either representative genomes or phylobins (Table S8a).In addition, the relative abundance of biomass-degrading enzymes (e.g.peptidases and nucleases) did not differ with respect to degree of 13 C enrichment or cellulolytic capacity, either in phylobins or representative genomes.In contrast, broad differences in functional gene categories were observed between Actinobacteria and Proteobacteria (Figure 5a).

Temporal Dynamics in Cellulose Economy
Early and late-stage colonizers of cellulose were identi ed according to genome-based predictions of growth rate (Table S7).Taxa designated as 13 C-enriched and cellulolytic were predicted to have faster generation times based on phylobins (3.0 hr) and representative genomes (3.1 hr) compared to 13 Cenriched non-cellulolytic taxa (5.7 hr and 5.6 hr, respectively; Figure 6a), though these differences were not signi cant (Kruskal-Wallis, p phybin = 0.33 and p rep = 0.52).However, genomes from 13 C-enriched noncellulolytic taxa exhibited a bimodal distribution (Figure 6a) and the set of genomes with slower generation times (generation time > 5 hr) were signi cantly more auxotrophic (µ slow = 15.8/32prototrophies) than taxa with faster predicted generation times (< 3 hr; µ fast = 23.8/32;Wilcoxon test, p = 0.05).The same trend was not apparent in phylobins, though predictions were unobtainable for two of the most auxotrophic bins (Vampirovibrionales and Chthoniobacterales).
The genome-based characterizations of early and late-stage colonizers were consistent with temporal patterns of taxa observed in time-course amplicon sequencing data.The highly prototrophic taxa Cellvibrio and Devosia, increased in relative abundance earliest, peaking in 13 C-enrichment at days 7 to 14 and declining by day 30 (Figure 6b; Figure S6a).Chaetomium were also early colonizers, showing 13 Cenrichment by day 7 (Figure S7).In contrast, the relative abundance of Actinobacteria was less dynamic, and these organisms tended to become labeled on, or after, day 14.Taxa predicted to be slowest growing, and identi ed as 13 C-enriched, non-cellulolytic, and highly auxotrophic (Planctomyces, Sphingomonas and members of Verrucomicrobia), began to increase in relative abundance only after day 14 and were maximally 13 C-enriched on day 30 (Figure 6c; Figure S6c).

Surface Adhesion and Surface Motility
Phylobins from cellulolytic taxa were more likely to encode the capacity for surface adhesion and/or surface motility than other groups, including twitch motility, pili systems and mbriae (Figure S8a).Surface attachment proteins were abundant in reference genomes from 13 C-enriched taxa (both cellulolytic and non-cellulolytic), and in phylobins classi ed as Rhizobiales and Caulobacterales (Figure S8b).Adhesion proteins used in gliding motility (aglZ and sprB) were present in reference genomes of cellulolytic taxa but absent from phylobins.

Discussion
We delineated members of a cellulolytic soil community using metagenomic-SIP to evaluate the ecological traits of microorganisms participating in the cellulose economy.Taxa identi ed as 13 Cenriched and cellulolytic had larger genomes and a greater number of genes encoding carbohydrateactive enzymes, secondary metabolites, surface motility or surface attachment, and tended to have faster generation times, when compared to 13 C-enriched non-cellulolytic and unenriched taxa.This evidence supports our hypothesis that the fate of cellulose carbon is mediated by ecological interdependencies among cellulolytic and non-cellulolytic taxa.Furthermore, 13 C-enriched cellulolytic taxa encoded diverse endoglucanases, representing 39 different subfamilies, but no single taxon encoded more than a third of these enzymes, suggesting the potential for synergistic decomposition.Auxotrophy was common among both 13 C-labeled cellulolytic and non-cellulolytic taxa, indicating that most taxa acquire essential metabolites from other community members.The average phylobin was auxotrophic for 9 of 32 pathways evaluated, though the highest levels of auxotrophy occurred among non-cellulolytic 13 C-labeled taxa.
The two most abundant cellulolytic taxa in the consortium, Cellvibrio and Chaetomium, were fast-growing and self-su cient (i.e.prototrophic), both qualities of ruderal organisms.Cellvibrio dominated access to cellulosic C in two other SIP studies of agricultural soils [75] and, in one of the studies, were speci c to agricultural soil [12].The most abundant endoglucanase in our metaproteome, a GH9 from Cellvibrio, predominated in worm castings from agricultural soil (ACY24809) [76].Both Cellvibrio and Chaetomium are commonly more abundant in tilled versus untilled elds (Figure S9) [77][78][79][80] and, the latter in disturbed forest soils [81].The occurrence of Chaetomium in agroecosystems may be linked to nitrogen fertilization, given their enrichment in N-fertilized elds and wetlands [82][83][84].The predominance of these ruderal cellulolytic taxa is indicative of the frequent soil disturbances in agroecosystems.Thus, it remains to be seen whether the cellulose economy of infrequently disturbed soils exhibits differing trends in the competitive exclusion or metabolic dependency reported here.

The ecological classes within the cellulose economy
Our results demonstrated that access to cellulosic carbon is mediated by trade-offs related to the capacity to produce carbohydrate-active enzymes, biosynthetic capacity, growth rate, and adaptation to colonize surfaces.Taken together our results suggest that, at least, three broad ecological groups of microorganism access 13 C from cellulose: (i) fast growing, biosynthetically competent cellulolytic taxa (e.g.Cellvibrio and Devosia), (ii) slow growing, metabolically-dependent (more auxotrophic), cellulolytic taxa (e.g.Actinobacteria), and (iii) slow growing, metabolically-dependent (highly auxotrophic), noncellulolytic taxa (e.g.Planctomycetales, Verrucomicrobia and Vampirovibrionales). Certainly, a wide range of adaptive traits will affect access to cellulose carbon during decomposition, but these three categories provide a framework we can use to dissect community interactions that affect carbon cycling.

Independent primary degraders
Bacteria in the rst category are rst to colonize cellulosic materials based on their cellulolytic competency, self-su ciency and rapid growth.On average, the phylobins and representative genomes of 13 C-enriched cellulolytic taxa were more prototrophic and had lower minimum generation times than their 13 C-enriched non-cellulolytic counterparts, though these results were statistically insigni cant due to phylogenetic and ecological diversity within groups.Cellvibrio and Devosia were among the most enriched taxon in the 13 C-DNA pool (1 st and 4 th , respectively) and were the two most prototrophic of any genome or phylobin examined.Cellvibrio and Devosia populations peaked earlier than any other 13 Cenriched taxa and were in decline as dependent taxa increased in relative abundance.The yeast Chaetomium exhibited similar trends of early 13 C-enrichment, occupying upwards of 20% of the 13 C-DNA pool by day 7 in a sibling study at the same eld site [34].Chaetomium are also prototrophic, being capable of growth on cellulose in minimal media without the addition of amino acids or cofactors [85], though our methods (designed for prokaryotes) failed to accurately annotate eukaryotic genomes.The rapid growth and self-su ciency of Cellvibrio and Chaetomium were coupled with a strategy of competitive exclusion via the production of antibiotics such as bacteriocin, likely a cellvibriocin [86], and fungicides [87-89].We expect the competitive nature of these early colonizers and their metabolic byproducts to in uence the ability of non-cellulolytic taxa to access cellulosic C.

Integrated primary degraders
Bacteria in the second category, primarily Actinobacteria but also Herpetosiphon (Chloro exi), were cellulolytic but exhibited higher levels of auxotrophy and SM production than early colonists.Populations of Actinobacteria lagged in comparison to Cellvibrio, with the rst signs of 13 C-labelling appearing at day 14, and populations did not increase consistently over time.These trends suggest a greater integration with other population that exert top down (mortality driven) or bottom up (nutrient limitation as a result of competition for nutrients) control.Actinobacteria encoded and produced the greatest number of SMs and SM peptides, including an abundance of terpenoids which can function in interspeci c signaling in soil, potentially facilitating mutualistic interactions [90].The potential bene t of metabolic dependency for cellulolytic Actinobacteria was apparent in their consistent auxotrophy for four of the costliest nonaromatic amino acids to synthesize, namely: isoleucine (ranked 1 st ), leucine (2 nd ), methionine (3 rd ) and lysine (4 th ) [91,92].We hypothesize, based on their cellulolytic capacity; SM production, and high degree of auxotrophy, that the tness of integrated primary degraders depends on community interactions.

Mutualists, opportunists and parasites
The third ecological group we observed, the 'MOP,' were metabolically dependent, late-stage colonizers of cellulose, characterized by the inability to degrade cellulose and high levels of auxotrophy.The MOP were comprised of Planctomycetales, Vampirovibrionales, and Verrucomicrobia (Luteolibacter, Candidatus Xiphinematobacter and 01D2Z36), which reached maximal relative abundance after Cellvibrio, Devosia and Chaetomium, and remained abundant even after their decline.This pattern suggests dependence on products of community metabolism either through co-metabolism, the consumption of metabolic byproducts or the consumption of macromolecules released during the turnover of microbial biomass.Indeed, these taxa all have traits that indicate lifestyles characterized by dependency on other microorganisms.
Planctomyces are commonly found to colonize the surfaces of marine algae, and to metabolize forms of algal polysaccharides, but not cellulose [93][94][95].They purportedly assimilate oligosaccharides into their cells, indicating the ability to scavenge higher molecular weight degradation by-products [13,[96][97][98].The capacity of Planctomyces to attach to surfaces with holdfast, and their distinct tolerance to a range of antibiotics, would advantage an opportunistic lifestyle, particularly amongst antibiotic-producing primary degraders [95,99,100].Cultured representatives for two other highly auxotrophic 13 C-enriched noncellulolytic phylobins are obligate symbionts, namely Vampirovibrio and Candidatus Xiphinematobacter.The former are algal parasites that encode a range of GHs [101] but lack endoglucanases, and the latter are endobionts of nematodes, and are commonly observed in forest litter, cellulose-degrading consortia or in associated with Basidiomycota [102][103][104][105].
One set of phylobins provided evidence for what could be considered opportunistic 'cheating' [20].
Phylobins from Sphingomonadales differed in terms of weak and strong 13 C-enrichment yet shared the same pattern of auxotrophy.The strongly enriched phylobins encoded several endoglucanases and bacteriocins, while the equally sized weakly-enriched phylobins lacked these capabilities.These data suggest that the strongly labeled cellulolytic strain is degrading 13 C-cellulose extracellularly and the weakly 13 C-enriched strain can access degradation products as well as other sources of unlabeled carbon present in soil.The capacity of Sphingomonas species to degrade cellulose through the activity of extracellular enzymes is known [106,107].

The role of surface ecology in decomposition
Several major populations of microbes that accessed 13 C from cellulose were capable of surfaceadherence and/or surface-motility.Genes encoding surface attachment were present in phylobins, or have been previously reported, in Rhizobiaceae (Ensifer/Sinorhizobium, Rhizobium and Agrobacterium), Hyphomicrobiaceae (Devosia), Sphingomonadaceae (Sphingomonas) and Caulobacteraceae (Asticcacaulis, Brevundimonas and Caulobacter), as well as in Pseudoxanthomonas and Planctomycetaceae (Planctomyces and Rhodopirellula) [108][109][110].Each of these genera, except for those in Planctomyceteceae, are represented by isolates capable of degrading cellulose [111][112][113][114][115][116][117][118].For these organisms, attachment would provide preferential access to the by-products of cellulose degradation.This phenomenon is exempli ed by the abundance of sugar transporters located on the stalk used by Caulobacter to adhere to surfaces [119,120].Attachment may also facilitate cooperation to crowd out competitors from accessing resources, as observed in the social behavior of Caulobacter during xylan degradation (D'Souza et al., bioRxiv pre-print available soon) or in the coordination of extracellular degradative processes by surface-gliding bacteria Herpetosiphon and Sorangium [121,122].Social interactions and cell aggregation density were critical determinants of the rate and e ciency of decomposition of particulate carbon [18].The dynamics of surface attachment have rami cations for ecology and evolution as well as biogeochemical cycling, which have yet been studied outside of the rumen [123,124].
Diversity at the sub-genus level in the cellulose economy Shotgun metagenomics provided a comprehensive view of the cellulolytic consortium but was ineffective at resolving the genomes of closely related species.Phylobins were comprised of large pangenomes which limited our ability to test for adaptive gene loss among closely related species, known to be important in the evolution of metabolic dependencies [20,125].The recovery of large single-genus phylobins for Myxococcales (Sorangium), Cellvibrionales (Cellvibrio), Planctomycetales (Planctomyces) and Micrococcales (Microbacterium), provided evidence of sizeable pangenomic genetic diversity which could re ect niche partitioning among close relatives.However, the degree of 13 C-enrichment within these single-genus phylobins did not differ, except for Planctomycetales and Sphingomonadales (i.e.'weak' versus 'strong' phylobins).We conclude that few differences in the capacity to access cellulosic carbon had occurred among closely related populations.

Conclusions
We used metagenomic-SIP and metaproteomics to evaluate the traits of microorganisms accessing C from cellulose in an agricultural soil, which we could group into three major classes.These classes included self-su cient cellulolytic bacteria and fungi (e.g.Cellvibrio and Chaetomium) that sought to restrict access via competitive exclusion, and other more integrated cellulolytic bacteria (e.g.Actinobacteria and Herpetosiphon) whose tness depended on the metabolic byproducts of the community.A third class of non-cellulolytic taxa that accessed cellulosic C (e.g.Planctomycetes, Vampirovibrio, Verrucomicrobia) were characterized by dependency on community resources as well as mutualistic, opportunistic and parasitic (MOP) interactions, which have yet been fully described due to challenges in cultivability [101].Our framework facilitates bottom-up measurement of the quantity and quality of each classes' contributions to carbon cycling.For example, the activity of independent degraders likely follows a more idealized pattern of growth and decomposition and be simpler to model.In contrast, the effects of interdependent degraders or MOP will require targeted experiments and more specialized modeling.A better understanding of the relative effect of each class, and conditions where their contributions are greatest, is now possible through the targeted study of the taxa we have identi ed.Our ndings emphasize that physiological traits and ecological interactions with non-cellulolytic taxa affect the degradation and fate of cellulosic carbon in soil and highlight the range of evolutionary adaptations that constitute the cellulose economy.

Methods
Sample Description and Recovery of 13 C-enriched DNA DNA-SIP was performed using an agricultural soil incubated with 13 C-labelled cellulose for 30 days [11].In brief, microcosms were prepared with soil from a tilled agricultural eld under organic management in Penn Yan, New York, as previously described [33].Samples were sieved (2 mm) and homogenized and pre-incubated for two weeks prior to initiation of the experiment.After soil respiration normalized, an amendment designed to approximate the composition of plant biomass was added.By weight, the mixture was comprised of 38% 13 C-labeled bacterial cellulose (99 atom % 13 C), 23% lignin alkali, 20% xylose, 3% arabinose, 1% galactose, 1% glucose, 0.5% mannose, 10.6% amino acids, and 2.9% Murashige Skoog basal salt mixture [11].The amendment was added to soil at 2.9 mg C g −1 soil dry weight.After incubation, extracted DNA was subjected to CsCl density gradient centrifugation and fractionated into thirty-ve 100-uL aliquots.Shotgun metagenomes were prepared from eight gradient fractions, starting at a buoyant density (BD) of 1.749 g ml -1 (F 6 ) and continuing to a BD of 1.717 g ml -1 (F 13 ).A schematic overview of the methods used in this study is presented in Figure S1.The 16S rRNA gene and ITS1 region amplicon data from this DNA-SIP experiment [11] and a sibling study [34] are available at the NCBI under BioProjects PRJNA317227 and PRJNA589050, respectively.

DNA and Peptide Sequencing
Shotgun metagenomes were generated by multiplexing DNA from each gradient fraction using the Nextera XT library preparation kit, then sequenced using three lanes of Illumina HiSeq 2500 (150-bp, paired-end).A subsequent round of sequencing was performed on each gradient fraction using a single lane of MiSeq (250-bp, paired-end) using a library prepared with the Illumina Nextera XT DNA Library Prep Kit (Product number: FC-131-1024, Illumina).The raw sequencing data is archived in the European Nucleotide Archive (BioProject: PRJEB23737).A full description of protein extraction, puri cation, digestion, mass spectroscopy, and peptide annotation are available in the Supplementary Methods.In brief, protein was extracted from soil with the NoviPure Soil Protein Kit (QIAGEN), initially separated and massed using a Waters nano-Acquity M-Class dual pumping UPLC system (Milford, MA) and a Q-Exactive HF mass spectrometer (Thermo Scienti c, San Jose, CA).Twenty-four fractions were subsequently submitted for LC-MS/MS analysis using an LTQ Orbitrap Velos mass spectrometer (ThermoFisher, Waltham MA).Peptides were identi ed from LC-MS/MS data using predicted protein sequences from the metagenome and ltered with a false discovery rate cut-off of 1%.
Assembly and Classi cation of SSU RNA genes Partial 16S and 18S rRNA gene fragments were identi ed in unassembled reads to estimate relative abundances.Fragments were identi ed using infernal [35] (v.1.1.2) and assigned taxonomy using the mothur implementation of the RDP Classi er [36,37] with the Silva database (silva.nr_v128)as reference [38].Full-length 16S or 18S rRNA genes were assembled using MATAM [39] also using the Silva database.We manually recovered a full-length 16S rRNA gene for Vampirovibrio, which were prevalent in SSU fragments, but not assembled by MATAM (details in Supplementary Methods).

Shotgun Metagenome Assembly
Metagenomes for each gradient fraction were composited and assembled using an iterative process to maximize assembly quality (Figure S1; details in Supplementary Methods).In brief, an initial de novo assembly was performed using megahit (v1.1.1-2-g02102e1)[40].Contigs shorter than 2,500 bp were discarded (~7% of total).Contigs were then classi ed by the Lowest Common Ancestor (LCA) algorithm implemented by MEGAN (v. 6) [41] based on DIAMOND BLASTX searches [42] against the NCBI 'nr' database (downloaded Feb. 3 rd , 2017).To improve assembly, two additional assemblies were performed on read sets with reduced sequence diversity.This reduction was achieved by segregating unassembled, quality-processed reads by mapping to (i) the LCA taxonomy of the initial assembly, at rank Order, and ii) to publicly available genomes represented in the full-length 16S rRNA gene library (Table S1).All assemblies were then merged using MeGAMerge [43] with the latest version of MUMMer [44] (v.4beta) designed for large datasets.Merging improved assembly statistics as determined by QUAST [45], increasing N 50 from 4,407 to 5,419 (Table S2).
Designating 13 C-enrichment of Contigs with Gradient-resolved SIP The relative abundance of every contig across the density gradient (a 'gradient pro le') was determined by calculating average read depth using 'jgi summarize bam contig depths' from MetaBAT [46] (v.2.12.1).
The gradient pro le of each contig was also simulated with natural abundance of 13 C (~1.1 atom % C) to control for variation in GC content using methods outlined in [32,47].A Random Forest regression model was used to assign a categorical degree of 13 C-enrichment for each contig, namely 'strongly' and 'weakly' enriched, 'unenriched' and 'bimodal' (i.e.local maxima in both heavy and light portions of the gradient), and 'undetermined' (examples in Figure S2).The following features were used to build the model: the number of local maxima and minima (and the fraction in which they occurred) and the average read depth in each fraction for observed and simulated gradient pro les.Data from 600 manually curated contigs were used to train the model which was implemented in the R package 'caret' [48].Model validation was performed on 20% of the training set (see R code in the Supplementary Data).

Genome Binning
Common tools for reconstructing MAGs, based on kmer frequency and covariance (in our case across the CsCl gradient), were prone to cross-contamination (see Supplementary Methods).In addition, MAGs constructed using standard practices failed to recover genomes from taxa known to be abundant in the metagenome and 13 C-labeled, including Chaetomium, Vampirovibrio and members of Verrucomicrobia and Chloro exales.Given these limitations, we opted to de ne a genomic unit based on 13 C-enrichment and LCA classi cation of contigs, which we term a 'phylobin.'Phylobins consisted of contig sets divided by 13 C-enrichment status (i.e.'strong', 'weak' and 'unenriched') and by the taxonomic rank at the level of Order (e.g.'strongly enriched Cellvibrionales').We justify this approach accordingly: (i) DNA-SIP selectively enriched for a relatively narrow subset of taxa within a given Order; (ii) phylogenetically related organisms with similar enrichment status are likely to share similar genomic and ecological traits.There is no universally appropriate taxonomic rank or phylogenic depth for grouping organisms, since functional traits are conserved at various phylogenetic depths [49].We chose the rank of Order as cutoff because LCA often fails to accurately classify to the species level taxa that are poorly represented in the NCBI 'nr' database.Hence, aggregating at the rank of Order decreases the risk of losing genomic information.Prior research has shown that aggregating microbiome data by taxonomic Order produced the greatest discriminating power of relevant soil microbial processes [50].The loss of resolution of individual genomes was compensated for by performing all analyses in parallel on reference genomes chosen based on the similarity of full-length SSU rRNA genes recovered in our study (µ similarity = 98%, n = 89) or, in some cases, by the only available representative genome for that genus or clade (n = 38; Table S1).

Functional Gene Annotation
Functional genes were annotated using curated databases relating to genes for motility, adhesion, secondary metabolite biosynthetic gene clusters (SMs), and catabolic enzymes for biomass and cellulose.SMs were annotated using the defaults settings of AntiSMASH [51] (v.4.1.0).Genes involved in cellulolytic activity, namely glycosyl hydrolases (GH), endoglucanases (speci c GH families), carbohydrate-binding modules (CBM), polysaccharide lyases (PL), and auxiliary activity enzymes (AA), were annotated using DIAMOND BLASTX searches against a local version of the CAZy database [52] (downloaded Dec. 20 th , 2017).A complete list of GH families deemed to be endoglucanases can be found in Supplementary Methods.Chitinases were represented by CAZy families GH18 and GH19.Genes encoding nucleases, adhesion (curli and holdfast proteins) and motility were annotated using DIAMOND BLASTX searches against a local version of the NCBI COG database [53] (downloaded May 1 st , 2018), and, in the case of motility, mapped to KEGG biosynthetic pathways for synthesizing complete motility apparatus (Table S3).Genes encoding peptidases were annotated using DIAMOND BLASTX searches against a local version of the MEROPS database [54] (downloaded July 1 st , 2018).The capacity for gliding motility was assessed using canonical genes from three model organisms: the focal adhesion protein in Myxococcus xanthus [55] (AglZ); the SprB and RemA adhesins in Flavobacterium johnsonia [56,57] and Gli349 and Gli521 in Mycoplasma mobile [58,59].Additional adhesion gene families were annotated using compilations of well-characterized proteins, including unipolar polysaccharide synthesis proteins (upp) [60] and tight adherence proteins (tad) [61].All annotations were based on a sequence identity cutoff of ≥ 60% across 90% of the full-length gene.
Auxotrophies were determined for each representative genome and phylobin based on 'genome-enabled metabolic models' (GEM) in KBase [62] according to [63].Brie y, ux balance analysis was performed on GEMs under two growth conditions: on a rich media containing all potential biomass precursors and on a minimal media containing only C and essential nutrients.The number of critical enzyme-catalyzed reactions were calculated for each GEM according to the following criteria: (i) the reaction was not involved in central C metabolism, (ii) was essential and carried ux only under minimal (i.e.not under rich) media conditions, and (iii) whose ux was coupled to the production of an essential compound.A genome was considered auxotrophic for a compound if the number of its critical reactions for its biosynthesis was below a compound-speci c threshold or if the number of gap-lled critical reactions exceeded a compound-speci c threshold.Thresholds were set based on auxotrophy pro les from a dozen well-characterized bacteria in the Bacteroidetes, Firmicutes, Alphaproteobacteria, and Gammaproteobacteria.

Statistical Analyses
Statistics were performed in R v. 3.4.2 [64] with the following packages: reshape2, ggplot2, plyr [65-67], Hmisc [68] and phyloseq [69].Non-parametric multidimensional scaling (NMDS) was performed using 'metaMDS' from the R package 'vegan' [70].The relative amount of variation in the primary and secondary NMDS axes explained by functional traits was calculated using the R package 'relaimpo' [71].Pairwise multiple comparisons based on the Kruskal-Wallis test ('kruskalmc') were performed using the R package 'pgirmess' [72].Minimum generation times were predicted for all phylobins and representative genomes using growthpred [73] (v.1.07) based on codon-usage bias using ribosomal genes (identi ed by COG ID) as the set of highly expressed genes.All analyses can be reproduced using R scripts and data available in the Supplementary Data package.Members of the cellulose-degrading consortium were de ned by their taxonomy and functional capabilities encoded in metagenome-assembled phylobins ('PBin') and representative genomes ('Rep').

List Of Abbreviations
Phylobins were categorized by their 13C-enrichment and cellulolytic capacity and ranked along the y-axis by the relative abundance SSU rRNA gene fragments recovered (indicated by barplots).Representative genomes were identi ed according similarity of full-length 16S rRNA gene (column 1) and were grouped with their respective phylobin.Representative genomes with less than 97% similarity to a phylobin 16S rRNA gene were shaded in grey.Several phylobins were comprised of genomes from multiple genera and the size of each (in megabases) and the percentage of peptides assigned to each phylobin are provided.
The remaining columns show the presence/absence of genes for endoglucanases or those involved in surface attachment, surface motility (M), and secondary metabolite (SM) production.Boxes were shaded grey if a member of that genus reportedly possesses the ability for attachment or surface motility.Secondary metabolite production was designated if peptides corresponding to antimicrobial gene clusters were detected in the metaproteome.Only the most abundant 'unenriched' phylobins are shown (full overview in Figure S7).Diverse glycosyl hydrolase (GH) genes were identi ed in cellulolytic phylobins, many of which matched peptides detected in the metaproteome (the number and type of each peptide is indicated).Taxonomic classi and GH family are provided for the endoglucanase gene fragments found within each phylobin (lowest LCA classi ed to the order (o_), family (f_), or genus (g_) level).Taxon speci c endoglucanase families are indicated in bold font.Five peptides matched to endoglucanase genes not belonging to any phylobins.
In (a), a comparison of the number of complete biosynthetic pathways (prototrophy) revealed that 13Cenriched non-cellulolytic (blue) phylobins and representative genomes were slightly less prototrophic than those which were either 13C-enriched cellulolytic (white) or unenriched (red).On average, these differences were not signi cant (Kruskal-Wallis; phylobins, p = 0.6; Representative genomes, p = 0.2), though major populations within each group (Planctomyces versus Cellvibrio) exhibited consistent differences in accordance with hypotheses.In (b), prototrophy was signi cantly correlated with genomes size for phylobins (r=0.39;p = 0.01), but not representative genomes (r = 0.14; p = 0.28).A ranking of prototrophy in all phylobins and representative genomes is available in Table S7.
Figure 5 The functional gene content of representative genomes was compared by NMDS using the Bray-Curtis dissimilarity of gene abundances normalized to genome size.Most variation among representative genomes was attributable to carbohydrate active enzymes content (63% of NMDS 1; Table S8).Four panels showing the same ordination were colored according to (a) the taxonomic classi cation at the phylum level; (b) rrn operon copy number; (c) abundance of glycosyl hydrolases, and (d) peptidase abundance.Genomes formed clusters according to cellulolytic potential (ANOSIM R = 0.498, p < 0.001) with the centroid of each group displayed in (c).Genomes loosely clustered by phylogenetic differences between Actinobacteria and Proteobacteria, though lacking statistical support (ANOSIM R = 0.1, p = 0.2) with the centroid (star symbol) for each shown in (a).In (b), functional gene data were tted to the ordination with arrow length proportional to the correlation between each variable and ordination axes.