One-Cell Metabolic Phenotyping and Sequencing of Soil Microbiome by Raman-Activated Gravity-Driven Encapsulation (RAGE)

ABSTRACT Soil harbors arguably the most metabolically and genetically heterogeneous microbiomes on Earth, yet establishing the link between metabolic functions and genome at the precisely one-cell level has been difficult. Here, for mock microbial communities and then for soil microbiota, we established a Raman-activated gravity-driven single-cell encapsulation and sequencing (RAGE-Seq) platform, which identifies, sorts, and sequences precisely one bacterial cell via its anabolic (incorporating D from heavy water) and physiological (carotenoid-containing) functions. We showed that (i) metabolically active cells from numerically rare soil taxa, such as Corynebacterium spp., Clostridium spp., Moraxella spp., Pantoea spp., and Pseudomonas spp., can be readily identified and sorted based on D2O uptake, and their one-cell genome coverage can reach ∼93% to allow high-quality genome-wide metabolic reconstruction; (ii) similarly, carotenoid-containing cells such as Pantoea spp., Legionella spp., Massilia spp., Pseudomonas spp., and Pedobacter spp. were identified and one-cell genomes were generated for tracing the carotenoid-synthetic pathways; and (iii) carotenoid-producing cells can be either metabolically active or inert, suggesting culture-based approaches can miss many such cells. As a Raman-activated cell sorter (RACS) family member that can establish a metabolism-genome link at exactly one-cell resolution from soil, RAGE-Seq can help to precisely pinpoint “who is doing what” in complex ecosystems. IMPORTANCE Soil is home to an enormous and complex microbiome that features arguably the highest genomic diversity and metabolic heterogeneity of cells on Earth. Their in situ metabolic activities drive many natural processes of pivotal ecological significance or underlie industrial production of numerous valuable bioactivities. However, pinpointing “who is doing what” in a soil microbiome, which consists of mainly yet-to-be-cultured species, has remained a major challenge. Here, for soil microbiota, we established a Raman-activated gravity-driven single-cell encapsulation and sequencing (RAGE-Seq) method, which identifies, sorts, and sequences at the resolution of precisely one microbial cell via its catabolic and anabolic functions. As a Raman-activated cell sorter (RACS) family member that can establish a metabolism-genome link at one-cell resolution from soil, RAGE-Seq can help to precisely pinpoint “who is doing what” in complex ecosystems.

, which couples aqueous-phase SCRS acquisition and sorting with microdroplet-based single-cell lysis, MDA, and then whole-genome sequencing (WGS). We showed that the workflow can identify, sort, and sequence at precisely one-cell resolution, based on the cell's D 2 O-assimilating and carotenoid-producing activities. Specifically, genome coverage of ;90% can be achieved from just one soil bacterial cell that was functionally profiled for metabolic activities via its SCRS. Such an ability to profile and correlate bacterial metabolic phenome and highquality genome sequences at precisely one-cell resolution is pivotal to unambiguously establishing the phenotype-genotype links in soil microbiomes, which thus should accelerate mechanistic dissection and bioresource mining from the soil and other complex ecosystems.

RESULTS
Benchmarking one-cell RAGE-Seq via mock microbial communities. RAGE-Seq addresses the challenges of small cell size and weight and biased MDA of microbial single-cell DNA by screening individual cells via acquiring SCRS in an aquatic, vitality-preserving environment and then precisely packaging the target cell (i.e., with characteristic SCRS) into a picoliter droplet that can then be readily exported in a precisely indexed, "one-cell-one-tube" manner (33). However, our past demonstration of RAGE-Seq was based only on Escherichia coli (both pure-cultured lab strains and individual cells directly from clinical urine samples) (33). To develop a precisely one-cell RAGE-Seq procedure for environmental microbiota, which are typically much more diverse and complex, we employed soil as a model. We are interested in sorting from soil (i) metabolically active cells, which can be distinguished based on a broad Raman band that appears in the SCRS between 2,040 and 2,300 cm 21 (peaked at ;2,157 cm 21 ; i.e., the C-D stretching vibrations shifted from the C-H stretching vibrations at 2,800 to 3,200 cm 21 [18] after deuterium incorporation from D 2 O into biomass via NADPH-mediated H/D exchange reactions), and (ii) carotenoid-producing cells, which can be recognized by characteristic SCRS bands of carotenoids in 1,500 to 1,550, 1,150 to 1,170, and 1,000 to 1,020 cm 21 (i.e., in-phase C=C [v3], C-C stretching [v2] vibrations of the polyene chain and in-plane rocking mode of CH 3 groups attached to the polyene chain [v1] [15,22,34,35]).
To benchmark the performance of RAGE-Seq for such metabolic phenotypes from microbiota, we first constructed a series of mock microbial communities (see Fig. S1A and B in the supplemental material), consisting of Escherichia coli K-12 DH5a (Ec), Helicobacter pylori ATCC 26695 (Hp), Synechococcus elongatus PCC7942 (Se), and one fungus, Saccharomyces cerevisiae BY4742 (Sc), in a 1:1:1:1 ratio. Then, three series of experiments were designed to test the specificity of RAGE-Seq, via sorting the mock communities based on cell morphology (experiment A, Fig. 1A), the C-D peak of SCRS (experiment B, Fig. 1B), or the carotenoid peak of SCRS (experiment C, Fig. 1C), respectively. For each of the criteria, 20 cells, 11 cells, and 11 cells, respectively, per experiment (in triplicates) were sorted and sequenced using a RAGE chip ( Fig. S1C; Materials and Methods). Thus, for the synthetic four-species mock microbiota, nine RAGE-Seq experiments in three biological replicates were performed, which sorted and sequenced 126 individual cells in total ( Fig. 2A to C; Table 1; Materials and Methods).
Specifically, experiment A (in biological replicates of A-1, A-2, and A-3) aims to test the specificity of RAGE-Seq in sorting primarily via cellular morphology. In each RAGE-Seq run, 5 cells from each of the four species (a total of 20 cells) were sorted from the mock microbiota. One-cell WGS of the corresponding MDA products suggested that 61.22% to 99.79% of the shotgun reads were mapped to the respective isolate genomes (except the 21.48% for an Sc cell, likely due to primer-dimer formation in its MDA reaction since 78.34% of its reads found no match in NR), with the average mapping rate for each species ranging from 52.79% to 98.83%. Among A-1, A-2, and A-3, the success rate for a given species ranged from 20% to 80%, with an average of ;43.3% (26 MDA-positive cells in 60 sorted cells). Moreover, genome completeness of the single-cell amplified bacterial genome (SAG) eventually produced ranged between 30.14% for an Sc cell and 99.66% for an Se cell, with the average for each species ranging from 43.36% (Sc) to 96.78% (Se) ( Table 1; Fig. 2A).
Experiment B (B-1, B-2, and B-3) aims to test the specificity of sorting and sequencing metabolically active cells (i.e., those carrying C-D peak in SCRS). In each RAGE-Seq run, based solely on SCRS, 11 Ec cells were sorted via the presence of C-D peaks, respectively. One-cell WGS results suggested that 66.35% to 94.41% of the shotgun reads were mapped to Ec (except the 19.13% for a cell, likely due to formation of primer dimers in MDA as 79.75% of its reads found no match in NR), and the average mapping rate was 65.00%, 77.15%, and 84.09%, respectively. For B-1, B-2, and B3, the success rate was 36.36%, 27.27%, and 18.18%, respectively (average of 27.27%; 9 MDApositive cells in 33 sorted cells). Moreover, genome completeness of the SAG ranged FIG 1 Benchmarking RAGE-Seq via a series of mock microbiota that include S. elongatus PCC7942 (Se), H. pylori ATCC 26695 (Hp), E. coli K-12 DH5a (Ec), and S. cerevisiae BY4742 (Sc). (A) Experiment A (for testing the specificity of RAGE-Seq via shape). Cells were sorted from the mock community based on different cellular morphology or full spectrum of SCRS and then sequenced. (B) Experiment B (for testing the specificity of sorting D 2 O-band-containg cells). Prior to the mixing, Ec was fed the D 2 O (to probe the metabolic activities of Ec). Cells that exhibited C-D peaks in SCRS were sorted and sequenced. (C) Experiment C (for testing the specificity of sorting carotenoid-band-containing cells). Cells were sorted from the mock community based on the presence of the carotenoid peaks and sequenced. In all three experiments above, the four types of cells were mixed in an equal ratio prior to performing SCRS acquisition and RAGE-Seq. All the RAGE-Seq reactions are one cell per tube. between 19.13% and 94.41%, and the average at each run was 36.30%, 74.10%, and 90.30%, respectively (Table 1; Fig. 2B).
Experiment C (C-1, C-2, and C-3) was designed to test the specificity of sorting and sequencing carotenoid-producing cells. In this four-species mock community, only Se contains carotenoids. Thus, in each RAGE-Seq run, based solely on SCRS, 11 Se cells were sorted based on the presence of the carotenoid-specific peaks, respectively. Onecell WGS results suggested that 69.78% to 98.27% of the shotgun reads were mapped to Se (except the 17.49% for a cell, likely due to primer-dimer formation in MDA since 82.12% of its reads found no match in NR), and the average mapping rate was 74.58%, 84.16%, and 88.10%, respectively. The success rate of C-1, C-2, and C-3 was 36.36%, 27.27%, and 27.27%, with an overall success rate of 30.30% (10 MDA-positive cells in 33 sorted cells). Moreover, genome completeness of the SAG ranged between 53.80% and 99.73%, and the average at each run was 88.24%, 97.23%, and 93.30%, respectively (Table 1; Fig. 2C).
Notably, in these experiments, the empty droplets derived from the aqueous phase around the target cells, which served as the negative controls, were all free of contaminating reads from members of the mock microbiota (for them, percentage of WGS reads in the "%Hit_no_genomes" category reaches 99.53% for experiment A, 99.95% for experiment B, and 98.23% for experiment C, consistent with nonspecific amplification in MDA; Table 1). This supports the stringency of the workflow and the aqueous sorting microenvironment of RAGE-Seq, i.e., being largely free of contaminating DNA from air, surface, or reagents (which are often encountered in single-cell isolating and   Experiment C (in triplicates of C-1, C-2, and C-3) tested the specificity of sorting and sequencing carotenoid-producing cells (Fig. 1C). NC, the empty droplets derived from the aqueous phase around the target cells, which served as the negative controls. %Hit_no_genomes, no hits in the NR database. Boldface indicates recovered WGS reads that match the originally targeted genome.
Single-Cell Phenotyping and Sequencing in Soil sequencing) (36,37). On the other hand, in the target-cell samples, slight contamination originating from members of the mock microbiota was observed (mostly ,0.5% and maximally 1.33% among all reactions in the three series of experiments; Table 1). These results suggest that the presence of DNA fragments in the liquid environment or sticking to the cell surface (which were then sorted together with target cell into the picoliter-level microdroplet by RAGE) may underlie the slight contamination observed in these one-cell RAGE-Seq reactions.
The experimental procedure of one-cell RAGE-Seq for soil microbiota. Next, we developed a one-cell RAGE-Seq workflow for soil microbiota, which are much more diverse and complex (Fig. 3). Soil samples were collected from grassland at a depth of ,3 cm on the campus of Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, China (36°991999N, 120°2895099E). The samples were homogenized and passed through a 0.6-mm sieve to remove small stones, grass roots, and other debris. Cells in the soil were extracted using Nycodenz density gradient centrifugation (Materials and Methods). Then, in a RAGE chip (Fig. S1C), the cells were trapped and analyzed individually in a RAGE chip with a 532-nm laser, which generates SCRS with a high signal-to-noise ratio. Then, the target cell with characteristic SCRS that correspond to specific metabolic phenotypes (e.g., metabolically active cells or carotenoid-containing cells; details below) was trapped and moved with a 1,064-nm laser to form a one-cell-encapsulated droplet (Movie S1). The one-cell droplet can be readily transferred to a tube for MDA and sequencing. Notably, as the RAGE-derived one-cell droplet already carries an oil phase (mineral oil), an emulsion reaction for MDA would be formed simply via vortex in the tube (after the aqueous phase of the lysis buffer is introduced). The one-cell MDA products, after quality assessment, are then shotgun sequenced separately via HiSeq, followed by de novo assembly and in silico metabolic reconstruction for the cell, therefore linking the genome to its SCRS-derived metabolic phenotype (Fig. 3).
One-cell RAGE-Seq of metabolically active bacteria in soil cell extracts via D 2 Oprobed SCRS. (i) On-demand sorting and retrieval. To gauge the ability of RAGE-Seq to tackle substrate-utilizing phenotypes of soil microbes, extracted cells from soil were incubated with 50% D 2 O. To determine the sampling time point for RAGE-Seq, time course D 2 O-probed experiments using the soil extracts were performed (with three biological replicates; Fig. S2). At 6 h, 12 h, 18 h, and 24 h after starting the D 2 O incubation, aliquots were taken, respectively, and SCRS from 100 randomly selected cells from each of aliquots were recorded. The results revealed the gradual increase of CDR (C-D ratio) at the consortium level over time, which plateaued before 24 h (Fig. S2). Therefore, 24 h of D 2 O incubation was chosen for RAGE-Seq of the soil cell extracts. A broad Raman band appeared in the region between 2,040 and 2,300 cm 21 , peaked at 2,157 cm 21 (Fig. 4A), which is the C-D stretching vibrations shifted from the C-H stretching vibrations at 2,800 to 3,200 cm 21 (18). This shift was attributed to the incorporation of deuterium from D 2 O to bacterial biomass via NADPH-mediated H/D exchange reactions in the metabolically active bacteria. Cells with the specific C-D bands in SCRS were then processed via RAGE-Seq, one cell per tube ( Fig. 4B; one cellfree sample was taken as a negative control in each batch of experiments). To validate successful MDA for each RAGE-derived cell, the 16S rRNA gene was amplified by PCR using the MDA product as the template (Fig. S3). Nine one-cell MDA products each with clear MDA bands and positive 16S rRNA PCR results were chosen for subsequent 16S and whole-genome sequencing ( Fig. S3A and B), generating ;3 Gb of raw sequencing data for each of seven cells (SR5, SR6, SR9, BSR2, BSR3, BSR5, and BSR11; the other two failed to yield sequencing library due to severe degradation; Table 2).
(ii) Recovery of high-coverage one-cell draft genomes. After quality control, clean reads from each cell proceeded to de novo genome assembly (Table S1). For each cell, GC% of the assembled contigs (.200 bp; after decontamination; Materials and Methods) exhibits a normal distribution (Fig. S4A). Moreover, t-SNE (t-distributed Stochastic Neighbor Embedding) projection of contigs from each cell (.1,500 bp) via their 4-mer signatures reveals distinct clustering patterns that are characteristic to the FIG 3 The pipeline of Raman-activated gravity-driven cell encapsulation and sequencing (RAGE-Seq) that links genotype to phenotype (metabolic activity and carotenoid production) in the soil. ‹ Nycodenz iohexol was added to the soil slurries. › Microbial cells from the soil were extracted by Nycodenz density gradient separation (NDGS). fi Carotenoid-producing cells were identified based on the characteristic Raman peak of carotenoids in SCRS and then sorted out in the RAGE chip, as a one-cell-encapsulated droplet, in a one-cell-one-tube manner. fl For sorting metabolically active soil cells, the soil extract was incubated in 50% D 2 O for 24 h.°Metabolically active cells were identified based on the C-D band in SCRS and then sorted out as a one-cellencapsulated droplet via RAGE. -The RAGE-sorted cells were lysed. † The genomic DNA was amplified by MDA and then processed for 16S rRNA sequencing and whole-genome shotgun sequencing. ‡ Assembly and annotation of the single-cell shotgun sequencing reads. · Metabolic pathways were reconstructed so as to establish the link between genotype (sequencing based) and the metabolic phenotype (SCRS based) at precisely one-cell resolution.
Single-Cell Phenotyping and Sequencing in Soil respective taxa of the cells (Fig. 4C). These results support the accuracy of draft genome reconstruction from one-cell assemblies from RAGE-Seq.
The identity of the target cell was determined based on that of the top contig bin (which consists of contigs binned to the same taxonomical unit). Based on DNA sequence similarity of contigs, the seven C-D-peak-present cells were pinpointed as being from Corynebacterium spp., Clostridium spp., Moraxella spp., Pantoea spp., or Pseudomonas spp. (Table 2). Interestingly, comparison of the results with 16S rRNA gene sequencing of the soil sample revealed that all of the seven cells were from lowabundance operational taxonomic units (OTUs) of the soil (all ,1% at the OTU level; Table 2; Data Set S1).
To evaluate the completeness of reconstructed one-cell genomes, CheckM (38), which is based on lineage-specific marker genes, was employed. It revealed that 89.05%, 22.25%, 46.90%, 11.68%, and 24.43% genome fractions were recovered for SR5, SR6, BSR2, BSR3, and BSR5, respectively ( Table 2). The contigs in SR9 and BSR11, both assigned to Moraxella spp., represent 92.62% and 90.23% completeness of their genomes, respectively ( Fig. 4C; Table 2). This supports the feasibility of RAGE-Seq to produce high-coverage genomes from just one post-RACS soil bacterial cell.   Fig. S3) derived from the aqueous phase, which served as the negative controls, did not contain any specific species information and were a nonspecific amplification product (accession numbers: SRR12829273 for NCD1 and SRR12829272 for NCD2). This supports the stringency of the workflow and the aqueous sorting microenvironment of RAGE-Seq.
Moreover, no significant difference in either completeness or GC content is apparent between the seven post-RAGE one-cell genomes and the one-cell genomes from the 17 randomly chosen, droplet-packaged bacterial cells from single-droplet MDA (sd-MDA [39]; Wilcoxon test; P . 0.05; Fig. S5A and B). Thus, in RAGE-Seq, the SCRS acquisition process (i.e., for metabolic phenotyping) does not seem to hamper subsequent one-cell MDA and sequencing, which is in sharp contrast to RACE-Seq, when the laser exposure during SCRS acquisition is the most important factor that negatively impacts the quality of post-RACE MDA and sequencing (29).
One-cell RAGE-Seq of carotenoid-producing cells in soil cell extracts via characteristic Raman peaks of carotenoids. To gauge the ability of RAGE-Seq to tackle product synthesis-related phenotypes, which are of interest for bioresource mining from soil, we focused on carotenoids, one of the most diverse classes of pigments (with over 700 kinds found [40]) and found in nearly all photosynthetic cells to protect them from exposure to UVA radiation (41). They also give characteristic bands in SCRS in 1,500 to 1,550, 1,150 to 1,170, and 1,000 to 1,020 cm 21 (as caused by in-phase C=C [v3], C-C stretching [v2] vibrations of the polyene chain, and in-plane rocking mode of CH 3 groups attached to the polyene chain [v1] [15,22,34,35]), which can serve as distinct criteria for RACS (22,23). In our soil samples, the carotenoid-producing cells showed the characteristic Raman peak of v1, v2, and v3, as expected (Fig. 5A).
Here, similar to sorting metabolically active cells, individual carotenoid-producing cells were isolated based on the characteristic peaks via RAGE from the same soil samples and then sequenced ( Fig. 5B and Fig. S3C). One-cell genome assemblies of the seven such cells (numbered CRG1, CRG2, CRG4, CRG5, CRG6, CRG7, and CRG11) pinpoint them as from Pantoea spp., Legionella spp., Massilia spp., Pseudomonas spp., and Pedobacter spp., respectively ( Table 2 and Table S1). GC% of the assembled contigs (.200 bp; after decontamination; Materials and Methods) exhibits normal distribution (Fig. S4B). In addition, the contigs of each cell show distinct clustering patterns characteristic of the respective taxa of the cells, based on t-SNE projection via their 4-mer signatures (Fig. 5C). These results support the integrity of the one-cell genome reconstructions. Notably, of the seven cells, three were of low-abundance bacterial genera, as suggested by relative abundance of 16S rRNA gene (,1%; Table 2). Moreover, members of the five phyla were known to synthesize carotenoids (Table 2), which is consistent with RAGE sorting criteria.
(i) Recovery of high coverage one-cell draft genomes. The overall success rate of one-cell RAGE-Seq (i.e., number of successful experiments divided by total number of attempts, with "success" defined as the ability to produce from the MDA product the target 16S-rRNA via PCR and then verify by sequencing), at 63.64%, is much higher than RACE-Seq (no success for one-cell reactions) (29) (Fig. 6A). To evaluate the completeness of reconstructed one-cell genomes, CheckM (38), which is based on lineage- Single-Cell Phenotyping and Sequencing in Soil specific marker genes, was employed, revealing estimated genome completeness of 12.23% to 58.66% (average of 29.65%, Table 2). Even though each of them is from just one carotenoid-producing bacterial soil cell, genome completeness values of post-RAGE-Seq cells are equivalent to those of RACE-Seq (each as a pool of 2 to 5 soil bacterial carotenoid-producing bacterial cells; 17.03%; P = 0.17; Wilcoxon test; Fig. 6B). In particular, one-cell genome completeness via RAGE-Seq can reach 58.66% (for Pantoea spp.), which is much higher than the best case for the optimized RACE-Seq, where completeness from 2 to 5 carotenoid-producing soil cells was just 26.42% (one-cell MDA reactions from RACE-Seq all failed for the soil samples) (29) ( Table 2). As a result, many more functional genes can be unraveled through RAGE-Seq than RACE-Seq, as evidenced by average number of KO (KEGG Orthology, 625 versus 281; P = 0.083; onesided t test; Fig. 6C) and unique KO (484 versus 239; P = 0.087; one-sided t test; Fig. 6D).
The biosynthetic pathways for carotenoids are widely found in plants and bacteria (43). Like all isoprenoids, carotenoids are synthesized from the production of isopentenyl pyrophosphate (IPP), which primarily occurs through the methyl D-erythritol 4phosphate (MEP) module that begins with glyceraldehyde-3-phosphate (Fig. 7A). Then, from four IPP molecules geranylgeranyl diphosphate (GGPP), which contains 20 carbon atoms, is synthesized. By condensing two GGPPs, the first carotene precursor, phytoene, is produced by phytoene synthase. Phytoene is converted to lycopene in four desaturation steps, and lycopene is then cyclized on both ends to form b-carotene (Fig. 7B). Then, identical hydroxylation of both b-carotene rings yields zeaxanthin, which can be epoxidated to form antheraxanthin or violaxanthin. Neoxanthin and other pigments are derived from violaxanthin, zeaxanthin, or b-carotene from which, e.g., astaxanthin is derived. In addition to the reconstructed MEP and b-carotene modules, our data reveal that the synthetic pathways of astaxanthin, a reddish keto- carotenoid classified as a xanthophyll found in various microbes (44), can also be reconstructed (Fig. 7C).
Interestingly, in CRG1, which was assigned to Pantoea spp. (a member of the Enterobacteriaceae family), the contigs from CRG1 were mapped to 0.19-Mb sequences (33.16%) and 145 genes of a plasmid from Pantoea (accession number NZ_CP038854.1; of 0.56 Mb and harboring 516 genes in total) (Fig. 7D); this suggests the ability to (at least partially) reconstruct mobile elements from one-cell RAGE-Seq assemblies. In addition, many genes in the b-carotene and astaxanthin modules were found from the sequences aligned to NZ_CP038854.1, suggesting their plasmid origin in CRG1. Notably, Pantoea spp., due to their rich production of carotenoids, have served as a model organism to provide pigment synthesis components (45-47): e.g., five biosynthetic genes in the natural carotenoid cluster from the cultured Pantoea ananatis were cloned and used to optimize zeaxanthin production in E. coli (48). Therefore, from just one yet-to-be-cultured bacterial cell recognized as carotenoid producing (via its SCRS), RAGE-Seq can recover the underlying biosynthetic genes, representing a new route to mine functional genes and pathways in a function-based yet culture-free manner.
The one-cell RAGE-Seq assemblies support more complete and more in-depth discovery of carotenoid-synthetic genes and pathways than 2-to 5-cell-pooled RACE-Seq assemblies. (i) In the MEP module, all the enzymes were identified in the data set merged from the seven RAGE-Seq-derived samples. Notably, each of CRG1 (Pantoea spp.), CRG2 (Legionella spp.), and CRG4 (Legionella spp.), the ones with top estimated genome completeness (58.66%, 34.99%, and 48.39%, respectively), was able to reconstruct by itself the complete MEP module (Fig. 7A), with the exception of the ispA gene in CRG4. In contrast, although each enzyme was found by at least one sample (except dxr), none of the seven RACE-Seq samples alone can reconstruct the complete MEP module (Fig. 7A).
(ii) In the b-carotene module, the first step unique to carotenoid synthesis is the synthesis of geranylgeranyl diphosphate (GGPP) via the enzyme GGPP synthase. Then it forms the phytoene, which is the precursor of all carotenoids. This reaction, catalyzed by the phytoene synthase (crtB), is considered the main bottleneck in the carotenoid pathway (49). However, only CRG2 from RAGE-Seq reports this enzyme (Fig. 7B). The CRG1 sample from RAGE-Seq recovered crtI and lcyB enzymes (but no crtB enzyme).

Single-Cell Phenotyping and Sequencing in Soil
Genes encoding all steps for carotenoid formation from GGPP were identified in collective one-cell assemblies from RAGE-Seq (Fig. 7B), and the most complete pathway reconstructed was also in CRG1 (Fig. 7B). In contrast, RACE-Seq recovered crtB in only three of the samples, and all the other enzymes in this module are missing in each sample (Fig. 7B).
(iii) In the astaxanthin module, crtW encodes b-carotene ketolase, which is one of the key enzymes in astaxanthin biosynthesis that catalyzes the formation of canthaxanthin from b-carotene via echinenone, while the crtZ gene encodes b-carotene hydroxylase. For RAGE-Seq, CRG1 reports two astaxanthin biosynthetic genes of crtW and crtZ, and both CRG2 and CRG4 report the crtZ gene (Fig. 7C). However, RACE-Seq missed all the genes in this module (Fig. 7C).
Notably, cultured relatives of the individual cells characterized via RAGE-Seq were potentially able to synthesize carotenoids ( Table 2). Together with the SCRS-based cell sorting criterion, these items of evidence support the ability of RAGE-Seq to link the carotenoid-synthetic phenotype with the underlying genomes at precisely one-cell resolution from soil.
Links between the D 2 O intake and carotenoid-producing phenotypes among soil cells. Interestingly, in the soil cell extracts, carotenoid-producing cells as recognized by their pigment spectra can either have C-D peaks (metabolically active, Fig. 8A) or not (metabolically inert, Fig. 8B). Thus, it is likely that many soil microbes that produce carotenoids might be metabolically dormant. This also suggests that cultivationbased efforts that depend on metabolically active cells can miss many potential microbial cell factories of carotenoids.
On the other hand, for those not showing carotenoid peaks in SCRS, both cells with C-D peaks (Fig. 8C, suggesting active metabolism) and cells without C-D peaks (Fig. 8D, inactive metabolism or dead cells) were found. Among the over 1,200 soil cells analyzed for SCRS in the replicated time course experiments, no correlation at the singlecell level is apparent between carotenoid-synthetic phenotype and the phenotype of general metabolic activity (i.e., D 2 O incorporation rate). This suggests the importance of the single-cell phenome, i.e., simultaneous profiling of multiple phenotypes for a given cell, in fully reconstructing "who is doing what" in soil.

DISCUSSION
For microbiomes that are functionally and genetically as diverse and heterogeneous as soil, the phenotype-dedicated one-cell genome sequencing strategy of RAGE-Seq is of particular importance to mechanistic dissection and functional-gene mining, where both numerically abundant and rare taxa can play important roles in microbiome activity. In contrast, in metagenomic sequencing of soil, reconstructing the genomes of functionally important members can be difficult, where sequence reads from numerically abundant species would dominate and, moreover, the difficulty in read binning hinders reliable genome assembly. Moreover, the ability to link genome sequence to an anabolic or catabolic activity (e.g., D 2 O assimilation or carotenoid synthesis) at precisely one-cell resolution is a pivotal advantage, since it provides one direct answer to the ultimate question of "Who is doing what? Why?" for the microbiome.
Like RAGE-Seq, RACE-Seq (which ejects a cell from a solid surface) can also phenotype, sort, and sequence cells in an "index" manner (30), yet applications of RACE-Seq to microbiomes from seawater (22,23), soil (29), and humans (20) are limited by the inability to produce high-quality genome sequence at the resolution of one cell (see Table S2 in the supplemental material), likely due to the reduction or loss of cell vitality when cells are air dried at a solid surface prior to Raman exposure, the direct exposure of the cell to the Raman exciting laser, and the effect of the pulsed laser for cell ejection on the cell (29). Specifically, in RACE-Seq, for DH5a cells, the highest genome coverage for five-cell pooled MDA reactions (after our optimization) was only 60%, and for soil microbiome, the genome completeness was no more than 30% for 2-to 5-cell pooled MDA reactions. As for precise one-cell reactions, the success rate of RACE-Seq has been very low even for pure cultured E. coli (,10%), while for soil microbiome all one-cell reactions via RACE-Seq have failed (29). In contrast, our results on soil microbiota here showed that precisely one-cell RAGE-Seq of diverse soil microbes can produce genome coverage as high as ;93%. The ability of RAGE-Seq to support phenotyping-sorting-sequencing at the resolution of one bacterial cell is due to single-cell indexing via precise single-cell capture and on-demand microdroplet encapsulation and the full preservation of cell vitality via aqueous-phase Raman detection and sorting, which ensures robust, high-quality sequencing and cultivation of the post-RAGE cell by avoiding laser-induced damage (33). Therefore, RAGE-Seq is particularly suitable for reliable metabolic phenotyping and sequencing of bacteria at precisely one-cell resolution from environmental samples (Table S2).
The RACS family includes additional tools that can potentially be coupled to downstream genome sequencing (14,27). However, although the throughput of flow-mode RACS such as RAMS (Raman-activated microfluidic sorting, ;60 cells/min for carotenoid peaks in yeast [50]), RADS (Raman-activated droplet sorting, ;260 cells/min for astaxanthin peaks in Haematococcus pluvialis [51]), or those based on optical tweezers (3.3 to 3.8 cells/min for C-D peaks for mouse gut bacteria [21]) are higher than RAGE, they have not yet demonstrated the ability to link the sequencing-based genotype to FIG 8 Links between the D 2 O intake and carotenoid-producing phenotypes among individual soil cells. (A) A carotenoid-producing cell whose SCRS harbors the C-D peak after the carotenoid peaks were quenched. (B) A carotenoid-producing cell whose SCRS does not harbor the C-D peak after the carotenoid peaks were quenched. (C) A cell whose SCRS harbors the C-D peak but does not show carotenoid peaks (i.e., not producing carotenoids). (D) A cell whose SCRS shows neither the C-D peak nor the carotenoid peaks.
Single-Cell Phenotyping and Sequencing in Soil the SCRS-derived phenotype at the one-cell level, which can be a challenge as the "index" of sorted cells is usually lost after the high-throughput flow sorting.
On the other hand, the current implementation of RAGE-Seq has several limitations. First, the overall success rate of one-cell reactions, ;36% for the mock microbiota (45 MDA-positive cells in 126 sorted cells) and ;41% for the soil microbiota (14 MDA-positive cells in 34 sorted cells) here, should be improved. In fact, all the published RACS-Seq studies so far have suffered from this problem (29,30,33). This is likely due to the radiation of Raman laser during SCRS acquisition (29). In one-cell MDA, which is another challenge (although not a problem of RACS itself), bias against high-GC-content DNA (52), hindering of reaction success by low-abundance templates (53), and negative effect due to highly fragmented contaminant DNA (54) have been reported. Thus, the overall success rate of RAGE-Seq can perhaps be elevated by reducing incident laser power and measurement time (29,55) or by reducing the volume of MDA reactions (56,57). Second, among the WGS reads from one-cell RAGE-Seq reactions a certain proportion of contaminating reads was present. This can be due to the environmental DNA fragments (either sticking to the cell surface or floating in the surrounding liquid setting) that were sorted together with target cell into the picoliter-level microdroplet by RAGE. Bias of MDA can magnify the proportion of such contaminating DNA, which may exacerbate the situation. Therefore, further improvement of RAGE-Seq experimental workflow will be important, such as optimizing cell incubation and washing process prior to SCRS acquisition and sorting (to reduce contaminating DNA fragments which can readily attach to cell surface) (58) and further decrease of microdroplet volume (which reduces the amount of floating-environment-DNA templates in the MDA reaction). Third, the relatively low throughput of RAGE-Seq in sorting (at 2 cells/min) and MDA (one-cell-per-tube) at present has hindered time-and cost-efficient analysis of hundreds or thousands of cells from a genetically and functionally diverse microbiome such as soil. This shortcoming, which led to the low microbiome sampling depth here, has limited the scope and nature of ecological insights that can be derived (i.e., findings from the sampling by RAGE-Seq of only a very small portion of cells in consortium should be interpreted with caution). The sorting throughput can be improved by adopting techniques such as surface-enhanced Raman scattering (SERS) or stimulated Raman scattering (SRS) which offer much higher Raman signals and thus reduce SCRS acquisition time, so that a much higher number of single cells can be efficiently profiled for their metabolic phenotypes (and genomes once sorted) from each microbiome. Flow-mode RACS methods can also potentially tackle this challenge (21,59,60) (Table S2), although solutions that couple the sequential SCRS-based sorting to paralleled, indexed preparation of microbial one-cell sequencing libraries remain to be developed. Finally, interpretation of findings from microbiome RAGE-Seq is also limited by, and dependent on, sample pretreatment. For example, the extraction of cells from their native microenvironment in soil prior to D 2 O feeding and Raman microspectroscopy can alter both the in situ metabolic activity and relative abundance of soil cells. Therefore, in addition to improving the throughput of RAGE-Seq, future efforts should include development of microbiome-feeding and cell extraction methods that maximally preserve the in situ population structure and metabolic activity of soil inhabitants.
Despite these challenges, the soil RAGE-Seq workflow introduced here is widely applicable, as the scope of metabolic phenotypes that can be derived from nonresonance (e.g., the C-D band) or resonance peaks (e.g., carotenoid-related bands) in SCRS is rapidly expanding (14). For example, SCRS can detect not just D 2 O intake (and phenotypes that are correlated with D 2 O intake such as antibiotic resistance) (17) but assimilatory activity of carbon or nitrogen sources labeled by stable isotopes of C or N (19,61). Similarly, the ability to mine both the cellular carotenoids (in both abundance and structure) and the underlying genotypes encoding biosynthetic pathways at the individual cell level should allow mining for not just carotenoids but lipids, polysaccharides, protein, and even antibiotics (24, 25, 62, 63). Therefore, it is possible that RAGE-Seq would become a universal and highly versatile tool for precisely probing "who is doing what" and for mining cells or metabolites of interest, from soil and other complex natural ecosystems.

MATERIALS AND METHODS
Bacterial species, media, and growth conditions. The series of mock microbiota all include the three bacteria E. coli K-12 DH5a, H. pylori ATCC 26695, and S. elongatus PCC7942, and the one fungus S. cerevisiae BY4742. Each of the strains was grown in pure culture. H. pylori ATCC 26695 (Qingdao Municipal Hospital, China) was cultured in brain heart infusion (BHI) agar (Oxoid, Basingstoke, England) supplemented with 7% defibrinated horse blood and incubated under microaerophilic conditions (85% N 2 , 10% CO 2 , 5% O 2 ) at 37°C. H. pylori ATCC 26695 was obtained by centrifugation at 8,000 Â g for 2 min and washed twice with BHI, and resuspended cells were diluted to an OD 600 of ;0.5 and inoculated at a ratio of 1:10 into 10 ml of BHI broth with 10% fetal bovine serum (FBS). E. coli K-12 DH5a was cultured in Luria-Bertani (LB) medium (tryptone, yeast extract, NaCl, pH 7.0) and incubated at 37°C. E. coli K-12 DH5a was diluted to an OD 600 of ;0.5 and inoculated at a ratio of 1:10 into 4 ml of LB medium. S. cerevisiae BY4742 was cultured in YPD medium (yeast extract, peptone, glucose, pH 6.5 to 6.8) and incubated at 30°C. S. cerevisiae BY4742 was diluted to an OD 600 of ;0.5 and inoculated at a ratio of 1:10 into 4 ml of YPD medium. S. elongatus PCC7942 was cultured in BG11 medium and incubated under lighting at 28°C. In these dilution experiments, we assumed a constant relationship of OD and cell concentration for all the cell types; while this might not be correct for each of them, this assumption would not change our experimental findings and interpretations. For deuterium isotope labeling, 50% (vol/vol) D 2 O (99.9 atom % D; Sigma-Aldrich, Canada) was used in all the above media. To prepare the media for deuterium isotope labeling, 2Â medium was prepared with water and autoclaved and then diluted to 1Â medium with filtered pure D 2 O, so that the eventual level of D 2 O was 50%. Each of the microorganisms was incubated in the respective medium containing 50% D 2 O until reaching the logarithmic phase, washed using distilled water, and mixed to form the synthetic consortia with defined structure. The synthetic consortia were then subjected to single-cell Raman spectroscopy and SCRS-based sorting, respectively.
Benchmarking performance of the one-cell RAGE-Seq method via mock microbiota. To semiquantitatively assess performance of the one-cell RAGE-Seq method, we performed replicated RAGE-Seq experiments on the synthetic four-species mock microbiota. Three different sorting criteria were respectively employed: criteria A (via cell morphology), B (D 2 O-peak-containing cells), and C (carotenoid-peak-containing cells), which sorted and sequenced 20 cells, 11 cells, and 11 cells, respectively, per experiment. Within each of the sorting criteria, three biological replicates were performed. Thus, on the synthetic four-species mock microbiota, nine RAGE-Seq experiments in three biological replicates were performed, which sorted and sequenced 126 individual cells in total. Method performance was then assessed via mapping rate (i.e., the number of mapped reads/total sequencing reads), success rate (i.e., the number of 16S-sequencing-validated SAGs/total sorted cells), and genome completeness (i.e., the percentage of aligned bases from assembled contigs in the reference genome; a base in the reference genome is aligned if there is at least one contig with at least one alignment to this base) of the one-cell sorting and sequencing results.
Extracting bacteria from soil and deuterium labeling of microbial cells. To extract the cells from soil, the soil slurries, generated by adding 1 g soil into 5 ml 1Â PBS buffer supplemented with 25 ml Tween 20, were vortexed for 30 min for freeing the particle-associated cells. In a new 15-ml centrifuge tube, 5 ml Nycodenz iohexol (1.42 g/ml; Aladdin, China) was added, and then the aforementioned supernatant from soil slurries was slowly added to the top of Nycodenz. The tubes were centrifuged at 14,000 Â g for 90 min at 4°C with slow acceleration and deceleration. At the middle layer, which is between the clear PBS layer and the debris layer, a faint whitish band containing bacterial cells would emerge (19,64). This band was recovered and transferred into a new 1.5-ml Eppendorf tube with a pipette. Then, 1 ml double-distilled water (ddH 2 O) was added to resuspend the cells, and the cells were pelleted by centrifugation at 10,000 Â g for 10 min at 4°C, 3 times. Finally, the cell pellets were resuspended in 0.2 ml ddH 2 O, which represented the "soil cell extracts." These soil cell extracts were then used for mining either carotenoid-producing cells or D 2 O-intake cells (i.e., metabolically active cells). For SCRS acquisition or RAGE-Seq of carotenoid-producing cells, the soil cell extracts were directly used. As for D 2 O-probed SCRS acquisition and RAGE-Seq experiments for metabolically active cells, the soil cell extracts were then incubated in PBS with a final D 2 O level of 50% at room temperature for a certain duration. To determine the sampling time point for RAGE-Seq of metabolically active cells, D 2 O-probed SCRS acquisition experiments were performed that temporally monitored D 2 O intake by the soil cell extracts, where aliquots were taken at 6 h, 12 h, 18 h, and 24 h, respectively, for Raman microspectroscopy. Based on the results, we chose 24 h for the RAGE-Seq experiments that target soil cells that actively assimilate D 2 O.
Acquisition of single-cell Raman spectra. The RAGE-Seq procedure was performed in a RACS-Seq instrument (Qingdao Single-cell Biotechnology, Qingdao, Shandong, China). Before the Raman test, the bacterial samples were washed to remove residual medium and then resuspended by adding deionized water to dilute them for performing the following Raman detection and sorting. The prepared bacterial solution (;1 ml) was hung up on the sample holder and then loaded into the RAGE chip. Raman spectra were acquired with a modified confocal Raman microscope. A 50Â dry objective (numerical aperture [NA] = 0.65; Leica, Germany) was used for sample signal acquisition and optical tweezers, while a 10Â dry objective was used for observation of droplet generation and transportation. All Raman spectra were preprocessed by background noise subtraction, baseline correction, and normalization to C-H band via LabSpec5 software. Cells with a C-D band in SCRS were isolated using RAGE-chip as described above. The selection of post-SCRS-acquisition cells to be sorted was based on a computer algorithm.

Single-Cell Phenotyping and Sequencing in Soil
Specifically, the sorting was based on C-D/(C-D 1 C-H); this ratio was calculated via dividing C-D peak area from 2,040 to 2,300 cm 21 by the sum of C-D area and C-H peak area from 2,800 to 3,100 cm 21 . The whole pipeline used here has been made available on GitHub (https://github.com/gongyh/RamanD2O). The tube which contained the target cells was then moved into a laminar hood, and buffer (Qiagen, Germantown, MD, USA) was added into the tube for cell lysis.
Chip fabrication for Raman-activated gravity-driven cell encapsulation (RAGE). The RAGE chip consists of two slides bound together with a semiopen design, as shown in Fig. S1 in the supplemental material, one sample inlet, and one open well for oil storage and droplet generation. For the top entered laser (laser entering from top slide), the microchannel was etched on the bottom layer slide. The channel was sealed by a smooth slide with two holes, one hole (;0.5 mm in diameter) for the inlet and one (5 mm in diameter) for the open well. With this design, the focus point and trapping force of lasers were not affected after penetrating the top layer of the chip. The scale of the thin channel between detection window and open well was adjusted for different sizes of cells. We used the 30-mm-width and 10-mmdepth microchannel for the sorting of microbial cells in soil.
Isolation of target cells from the soil sample by RAGE. The cells in suspension were injected into the chip with the height-adjustable sample holder. The well was filled with mine oil (2% wt EM90) when the cell phase reached the open well. The height of the sample holder was adjusted to obtain a balance between the water phase and oil phase. The cells located statically in the detection window were trapped and analyzed with the 532-nm laser to identify target cells. The cell sorting can also be based on visible phenotypes such as morphology or autofluorescence without the 532-nm laser. Then, a second laser (1064 nm) was employed to trap and move the target cell to the tip of the aqueous phase. The sample holder was elevated to generate only one droplet that encapsulates the cell and then lowered to the original height. The single target cell was thus isolated and encapsulated in the droplet. As the density of the oil used here is lower than water, the droplet stays statically at the bottom of the open well. At the end, the droplet with the target cell can be easily taken out to a tube for downstream analysis, with a pipette tip.
Notably, the bacterial cell adsorption onto the channel wall can be neglected here in our quartz chip. Although the in-chip environment is a static mode, the majority of cells still show irregular movements during sorting due to pedesis or flagellum effect. In addition, new cells can be injected into the chip continuously by elevating the sample holder. The volume on the sample holder which is connected into the sorting chip is usually 1 to 2 ml, yet the volume of the chip is only ,5 ml; thus, all the cells in the chip can be replaced hundreds of times when necessary, without the need for new loading. In our current setting, the sorting throughput can be maintained for at least 3 h, and cell settling is not our major concern.
Multiple displacement amplification. The REPLI-g single-cell kit (Qiagen, Hilden, Germany) was used for the DNA amplification. Cell lysis was carried out at 65°C for 15 min with 2 ml lysis buffer for each sample, followed by addition of 1 ml stop solution to neutralize the lysis buffer. REPLI-g single-cell (sc) reaction buffer and REPLI-g sc DNA polymerase were added, and the mixture was incubated at 30°C for 8 h with a 70°C hot-lid temperature for MDA reactions. Blank control (without any cells) was also included to detect and quantify potential contamination. After that, the MDA products were processed for 16S rRNA gene (with primers 27F [forward, 59-AGAGTTTGATCCTGGCTCAG-39] and 1492R [reverse, 59-TACGGYTACCTTGTTACGACTT-39]) and/or 18S rRNA gene (with primers NL1 [forward, 59-GCATATCAAT AAGCGGAGGAAAAG-39] and NL4 [forward, 59-GGTCCGTGTTTCAAGACGG-39]) PCR analysis and then high-throughput sequencing.
Library construction and next-generation sequencing (NGS). (i) 16S rRNA gene sequencing. Two grams of soil was frozen at 280°C prior to DNA extraction with three replicates. Total genome DNA from the soil sample was extracted using the Magen Hipure soil DNA kit (lot no. HE280200) according to the manufacturer's protocols. DNA was quantified using the Qubit 3.0 fluorometer (Invitrogen, Carlsbad, CA, USA). For each sample, 20 to 30 ng DNA was used to generate amplicons using a MetaVx library preparation kit (Genewiz, South Plainfield, NJ, USA). V3 and V4 hypervariable regions of prokaryotic 16S rRNA genes were selected for generating amplicons and subsequent taxonomy analysis. The V3 and V4 regions were amplified using forward primers containing the sequence CCTACGGRRBGCASCAGKVRVGAAT and reverse primers containing the sequence GGACTACNVGGGTWTCTAATCC. At the same time, indexed adapters were added to the ends of the 16S rRNA gene amplicons to generate indexed libraries ready for downstream NGS on Illumina MiSeq. PCRs were performed in a triplicate 25-ml mixture containing 2.5ml of TransStart buffer, 2ml of deoxynucleoside triphosphates (dNTPs), 1ml of each primer, and 20 ng of template DNA. DNA library concentrations were validated by a Qubit 3.0 fluorometer. Sequencing was performed using a PE250 paired end on an Illumina MiSeq instrument (Illumina, San Diego, CA, USA) by Genewiz.
(ii) One-cell genome sequencing. The target MDA products were treated with S1 nuclease (Thermo Fisher Scientific, Waltham, MA, USA) to degrade the single-stranded nucleic acids and then purified by Agencourt AMPure XP beads (Beckman Coulter, Brea, CA, USA). Next-generation sequencing library preparations were constructed following the manufacturer's protocol (NEBNext Ultra DNA library prep kit for Illumina). For each sample, 1mg genomic DNA was randomly fragmented to ,500 bp by sonication (Covaris S220). The fragments were treated with End Prep enzyme mix for end repairing, 59 phosphorylation, and dA-tailing in one reaction, followed by a T-A ligation to add adapters to both ends. Size selection of adapter-ligated DNA was then performed using AxyPrep Mag PCR cleanup (Axygen), and fragments of ;410 bp (with the approximate insert size of 350 bp) were recovered. Each sample was then amplified by PCR for 8 cycles using P5 and P7 primers, with both primers carrying sequences which can anneal with a flow cell to perform bridge PCR and the P7 primer carrying a six-base index allowing for multiplexing. The PCR products were cleaned up using AxyPrep Mag PCR cleanup (Axygen), validated using an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA), and quantified by a Qubit 2.0 fluorometer (Invitrogen, Carlsbad, CA, USA).
Then, libraries with different indexes were multiplexed and loaded on an Illumina HiSeq instrument (Illumina, San Diego, CA, USA). Sequencing was carried out using a 2 Â 150 paired-end (PE) configuration; image analysis and base calling were conducted by the HiSeq control software (HCS)1 OLB1GAPipeline-1.6 (Illumina). Samples were quantified using a Qubit 2.0 fluorometer (Invitrogen, Carlsbad, CA, USA).
Sequencing data analysis. (i) 16S rRNA gene sequencing. The software package QIIME (version 1.9.1) (65) was used for 16S rRNA gene data analysis. The forward and reverse read pairs with minimum overlapping bases of 20 bp were joined and assigned to samples based on barcode and truncated by cutting off the barcode, primer sequence, and low-quality bases (59 end; quality score ,20). Quality filtering on joined sequences was performed, and sequences which did not fulfill the following criteria were discarded: sequence length .200 bp, no ambiguous bases. Then, the chimeric sequences were removed using the UCHIME algorithm (66). Putative contaminants were removed from data sets, as were singletons. Subsequently, the remaining high-quality reads were grouped into operational taxonomic units (OTUs) using the Vsearch algorithm (67) and aligned using default parameters against the Silva_132 database (68). Representative sequences for the shared OTUs, as defined by 97% similarity, were obtained. Relative abundances of the bacterial taxa at the phylum, class, order, family, genus, and species levels were calculated and compared, respectively.
(ii) One-cell genome analysis. A dedicated computational pipeline for the single-microbial-cell genome was developed (https://github.com/gongyh/nf-core-scgs) to efficiently analyze single-cell amplified bacterial genome (SAG) data sets by integrating various tools with Nextflow (69). Briefly, reads that passed Illumina's chastity filter were first quality checked using FastQC (https://www.bioinformatics.babraham.ac.uk/ projects/fastqc/) and then quality trimmed using Trim Galore (https://www.bioinformatics.babraham.ac.uk/ projects/trim_galore/) in paired end mode for each sample. To detect contaminated DNA fragments, clean reads were phylogenetically classified using Kraken (70). Clean reads were then assembled into contigs using SPAdes (71) in single-cell mode. Taxonomic composition of assembled contigs (longer than 200 bp) was visualized using BlobTools (72). Assembled genomes were annotated using Prokka (73), KofamKOALA (74), and eggNOG-mapper (75). Considering the possibility of DNA contamination for environmental samples, assembled contigs were further split into bins by taxonomic annotations (in the genus level) for each SAG, followed by estimation of genome completeness using CheckM (38). Since a fraction of contigs (especially fragments from plasmids) cannot be assigned to proper taxa, the whole SAG assembly was used to recover metabolic pathways. For the SAG of CRG1, Quast v5.0.2 (76) was used to align contigs to the reference plasmid (with parameters "-m 200 -min-identity 80"), and the alignments were visualized by pyGenomeTracks (77).
Data availability. The sequence data reported in this study have been deposited to the NCBI SRA database (PRJNA646329, PRJNA640996, PRJNA640983, and PRJNA669567).

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.