Modern microbiology: Embracing complexity through integration across scales

Microbes were the only form of life on Earth for most of its history, and they still account for the vast majority of life’s diversity. They convert rocks to soil, produce much of the oxygen we breathe, remediate our sewage, and sustain agriculture. Microbes are vital to planetary health as they maintain biogeochemical cycles that produce and consume major greenhouse gases and support large food webs. Modern microbiologists analyze nucleic acids, proteins, and metabolites; leverage sophisticated genetic tools, software, and bio-informatic algorithms; and process and integrate complex and heterogeneous datasets so that microbial systems may be harnessed to address contemporary challenges in health, the environment, and basic science. Here, we consider an inevitably incomplete list of emergent themes in our discipline and highlight those that we recognize as the archetypes of its modern era that aim to address the most pressing problems of the 21st century.


INTRODUCTION
Our planet faces formidable challenges that span from climate change, environmental pollution, depletion of fertile soils, and loss of biodiversity to feeding and caring for the ever-growing human populations in a sustainable fashion.Microbes, whose evolution has been intimately interconnected with changes in Earth's physicochemical environments over biological time, remain an integral part of these interlinked crises.With its aim to understand microbial life and given the potential to harness microbes to address these huge problems, the field of microbiology is emerging as arguably the most important science of the mid-21st century.
Microbiology has made remarkable progress throughout its relatively brief, few hundred-year history.The advances seem far from reaching a plateau, as we have likely just begun to scratch the surface of major biotechnological and medical advances that are enabled by studies of microbial life.Today, microbiologists aim to elucidate and ultimately address a wide range of processes that include greenhouse gas emissions and soil carbon sinks to wastewater management, protection of clean water resources, detoxification of heavy-metal and organic pollutants, maintenance of soil fertility and rhizosphere nutrition, the emergence and mitigation of plant and animal pathogens, antibiotic resis-tance, and pandemics.Investigations to address such diverse questions often rely on a broad range of data-producing and processing technologies, which demand increasingly higher levels of integration and synthesis for unified insights.
Microbiology draws its power from relatively well-established technologies such as molecular sequencers, mass spectrometers, high-throughput cultivation, genetic manipulation, and a large arsenal of 'omics strategies, statistics, and modeling, as well as technologies that are rapidly emerging, such as artificial intelligence, machine learning, microbiome-based genome editing, and more.As contemporary questions of microbiology demand the orchestration of diverse data generation and analysis strategies that yield increasingly large datasets, integrative approaches that span across scales of complexity emerge as one of the hallmarks of modern microbiology.Here, we take a look at the past and contemplate the future of our rapidly transforming discipline by starting with a discussion on some of the major themes that underpin the new data-intensive epoch of microbiology.and to recognize new motivations, opportunities, and challenges that influence today's microbiologists.If we could explain how we study microbes today and what we know about them to Antonie van Leeuwenhoek, who saw microbes the first time in the 1670s, to Ignaz Semmelweis, who realized washing hands reduces patient mortality in hospitals in the 1840s, or to Fanny Hesse, who established agar to grow microbes in laboratory settings in the 1880s, what would they find to be the most fascinating advance microbiology has achieved compared with its early days?We indeed would have had a myriad of surprises to show them proudly, as we now can isolate microbes in nanodroplets, 1 image them in situ with colorful probes 2 and in aqua in three dimensions, 3 survey metabolites they exchange, 4 visualize and model their metabolic activities both at the level of individual elements 5 or at planetary scales, 6 pull them around from their surface proteins, 7 stick them into little wells, 8 and even modify, 9 transform, 10 or synthesize 11 their genetic repertoire, or transfer their genes into the genomes of plants 12 and animals. 13But apart from the permanently central role of technology in microbiology, there is one conceptual fracture in our otherwise continuous progress toward a better understanding of microbial life that can serve as a line between the still alive and well classical roots of microbiology and its aspiring modern times: our expanding focus from individual organisms in cultures to complex communities in natural habitats, with the admission that ''a pure culture is not enough.'' 14he shift from individuals to communities in microbiology was enabled by a series of technological and conceptual advances: it was foreshadowed by the theory of evolution and guided by many breakthroughs, such as the discovery of the structure of DNA, 15 the recognition of the utility of the molecular information encoded in genomes as a means to document evolutionary history, 16 and the arrival of sequencing. 17One of the latest of the most defining conceptual advances that culminated the thread of progress into a single evolutionary framework that connected all cellular life was the work led by Carl Woese, which offered a first glimpse of the tree of life, revealed by comparison of ribosomal RNA (rRNA) gene sequences in 1977. 18The translation of this transformative idea to practical applications in microbiology started to emerge through the pioneering work by Norman Pace and colleagues that described rapid sequencing of rRNA gene fragments 19 and cloning methods that enabled gene recovery from environmental samples, 20 which gained further traction with simpler molecular strategies that require lower input biomass. 21In the following years, PCR amplification of 16S and 18S rRNA gene fragments became an increasingly popular method to survey microbial diversity in naturally occurring habitats. 22,23Coinciding with increasing availability and affordability of short read sequencing of early 2000s, a plethora of amplicon studies revealed the richness and dynamism of microbial communities from marine 24 and terrestrial systems 25 to host-associated habitats 26 and led to a broad recognition and awareness of Earth's flourishing microbiomes and their connection to the rest of life. 27The transition from individuals to whole microbiomes motivated a societal-level adjustment of our perspective: microbes were no longer a band of bad actors in the public mind that typically caused diseases, but the architects of life, the wardens of planetary health, and the fundamental forces that make our ecosystems tick.
Surveys of microbial diversity through rRNA gene amplicons were a wildfire in microbiology and resulted in over 85,000 studies in the past 20 years.Yet, taxonomic insights offer limited utility to understand functional drivers of biological systems, a pinnacle desire that brings together many corners of microbiology.Despite the inspiration amplicon surveys elicited to better understand microbial systems, their integrative power remained severely constrained.Understanding of microbes in their environmental context required genomes, which first became available for isolates that were generated by indispensable yet labor-intensive isolation efforts. 28However, isolation is inherently biased toward microbes that can grow alone and for which suitable growth conditions and substrates are known, and it has not delivered even a single representative of the majority of currently known lineages. 29he realization of the need for direct access to the genetic makeup of environmental populations intensified during the late 1990s.In pursuit of natural products potentially available from the untapped diversity of microbes in the environment, Handelsman and colleagues introduced a cloning-based approach to access genes and pathways of interest from soil ''metagenome.'' 30The early approaches to metagenomics, which promoted the idea of cultivation-independent surveys of environmental populations, relied upon ''isolating DNA from an environmental sample, cloning the DNA into a suitable vector, transforming the clones into a host bacterium, and screening the resulting transformants.'' 14The arduous beginnings of metagenomics offered unique insights into environmental microbes, 31,32 and demonstrated the feasibility of culture-independent surveys of natural populations.An alternative approach to metagenomics relied upon the sequencing of randomly chosen short DNA fragments. 33Combined with the advances in sequencing chemistry and molecular approaches, this ''shotgun sequencing'' approach replaced cloning-based strategies and ultimately enabled the reconstruction of genomes directly from natural ecosystems.
The core idea behind reconstructing genomes from environmental metagenomes, which was laid out in the first example of its kind, stood on a relatively simple principle that relied on a few imperfect but computationally tractable steps 34 : assembling metagenomic short reads into longer contiguous DNA segments (contigs), estimating the abundance of contigs in metagenomes, and partitioning contigs into separate genome bins by exploiting differences in GC content and abundance of each discrete population.The power of genome-resolved metagenomics came from the fact that no aspect of it required any prior knowledge about organisms in a given environment.Its simplicity allowed this strategy to be encapsulated in automated workflows, from which emerged a plethora of genomes, not only from taxa microbiology knew about but also from those we had not recognized before, and not only from the primary domains of life but also from viruses, plasmids, and other exciting biological entities.Over time, the size of sequencing datasets increased, along with the power of bioinformatics tools, enabling the transition of genome-resolved surveys from relatively low-complexity systems to the microbiologically most complex ecosystems on Earth, such as oceans and soils.Metagenome-assembled genomes (MAGs), along with single-amplified genomes (SAGs) and those that emerged from improved isolation efforts, opened the floodgates of genomic data pouring in from all environments.Today, the rapidly maturing evolutionary framework around genomes offers a playground for all corners of our discipline to integrate heterogeneous data types that slice through microbial systems in complementary ways through analytical, computational, and molecular tools to address fundamental questions of modern microbiology (Figure 1).

From communities back to individual microbes
Cultivation, the most fundamental methodological advance that shaped classical microbiology by enabling molecular insights with broad applications through manipulation of representative microbial organisms, continues to be the bedrock of contemporary studies of microbial physiology in the post-omics era. 35The ability to reconstruct microbial genomes directly from the environment represents an effective means to improve cultivation efforts as it can provide clues regarding metabolic requirements of little-known organisms and guide their directed isolation. 36The first demonstration of this idea was the isolation of Leptospirillum ferrodiazotrophum 37 from a relatively simple subsurface acid mine drainage biofilm, following the observation that it was the only organism in its community capable of nitrogen fixation.The first successful isolation of Coxiella burnetii, 38 the causative agent of human Q fever, 39 represents another example of how partial insights into genomic features of target organisms can help define complex media that support the axenic growth of even obligate intracellular organisms.Peering into the metabolic make up of environmental communities through MAGs continues to be a powerful tool to support cultivation efforts even from complex mixtures of microbes.For instance, Pope et al. were able to isolate a representative of the family Succinivibrionaceae, so-called WG-1, that dampens methane emissions from digestive activities of macropods using a composite genome reconstructed directly from a metagenome.Metabolic insights gained from this genome supported the development of a medium with specific carbohydrate and nitrogen sources as well as antibiotics to achieve an axenic culture. 40Others applied similar principles to grow microbes from other environments, including human gut 41 and even extreme subsurface environments. 42Prediction of the episymbiotic lifestyles of candidate phyla radiation (CPR) bacteria from environmentally derived genomes 43 led to the co-isolation of these tiny bacteria with their hosts 44 and opened the way for microscopic, 45 metabolic, 46 transcriptomic 47 definitions of the impact of these symbioses and their interactions with their hosts. 48An exciting workflow that integrated the use of environmental genomes, animal immunity, and fluorescent labeling for targeted cultivation of microbes was demonstrated by Cross et al., who cultivated multiple CPR bacteria, including the elusive Absconditabacteria (SR1). 7Cross et al. first surveyed SAGs and MAGs of target bacteria to identify membrane proteins that could serve as immunogens, then predicted extracellular domains in these candidate genes with homology to previously recognized immunogens, synthesized matching peptides, introduced them into rabbits, recovered antibodies generated by the rabbit immune system, used these antibodies to ''label'' bacterial cells in a human saliva sample, and were able to sort labeled cells into cultures. 7 addition to culture-independent insights that improve cultivation efforts and new approaches that aim to preserve natural growth conditions of environmental microbes 49,50 or their interactions, 51 modern approaches to cultivation include highthroughput strategies that exploit robotics and microfluidics.Recent studies demonstrate the utility of microfluidics to generate thousands of nanoliter droplets on petri dishes with chemical gradients, 52 and the application of robotics and machine learning to process thousands of colonies per hour. 53ne critical application of microfluidics is targeted cultivation of microorganisms based on any genetic feature that is learned from metagenomes.Using this strategy, Ma et al. were able to identify a microcolony that carried their target gene sequence among 500 microcolonies they grew on a multiplexed microfluidic device and recovered the first isolate of a previously unidentified genus of the Ruminococcaceae family 54 that has been among the most abundant and prevalent taxa in the human gut with no cultured representatives until then. 55Another exciting application of microfluidics is the untargeted, high-throughput cultivation of microorganisms in droplets.The isolation of individual bacterial cells in droplets reduces the loss of microbial colonies in cultivation efforts due to resource competition, differences in growth rates, or dilution of quorum-sensing molecules. 56In an application of this approach, Watterson et al. populated millions of picolitre droplets created from distinct culture media with individual microbial cells from a single sample. 1 Droplets they sorted based on colony density included many of the rare organisms conventional cultivation strategies missed in the same sample and recovered antibiotic-resistant strains that conventional plate-based assessments of antibiotic resistance were unable to detect. 1 Dramatic expansion of the tree of life Our newly found power to sequence genes from environmental samples had far-reaching implications on our understanding of the diversity and evolution of life through phylogenomics and resulted in remarkable changes to the structure of the tree of life.The sequencing of marker genes such as rRNAs had revealed many, but not all, branches of life, but inconsistencies in predicted physiology in major branches and unexpected gene inventories only became evident upon recovery of genomes.Notable features of the revised and genome-informed tree of life include the Asgard archaea with their inventories of eukaryote-associated genes, 57 making them of great interest from the perspective of the origin of eukaryotes. 58Other major groups of particular interest include the possibly not monophyletic DPANN archaea and the seemingly monophyletic CPR, which display consistently small genomes and limited biosynthetic capacities.This led to the prediction that both the bacterial and archaeal domains of life feature large groups of microbes that they generally depend on host microorganisms for basic building blocks such as lipids and nucleotides, 59 and recent work has clarified the partitioning of host-derived lipids into some episymbionts. 60Inclusion of sequences from metagenomes as well as isolates brought into focus the dramatic prevalence of branches of the tree that remain without any isolated representatives. 29Further refinements have begun to clarify relationships between deep branches, for example, the Chloroflexi Figure 1.The measurement tools, data types, and analysis strategies commonly used in modern microbiology to study naturally occurring microbial organisms in a wide range of complex habitats through rapidly emerging genomic blueprints of the diversity of life and CPR, and the distribution of traits. 61The revised views of the bacteria and archaeal domains as well as the whole tree of life contrast dramatically with even contemporary views of the diversity of life that are highly eukaryote-centric and, in some cases, do not even include non-eukaryotic organisms.These renderings promote a new appreciation of our place in biological evolution and biodiversity.
The value of complete and finished genomes is hard to over emphasize.In addition to providing a quality-assured inventory of genes and metabolic potential without contamination by sequences from other sources, finished genomes ensure full genomic context for any genetic feature and define the overall genome architecture, including exact genome size and replichore structure.However, one of the hallmarks of the contemporary catalogs of MAGs is the low completion and occasional contamination issues due to the highly fragmented nature of metagenomic assemblies. 62While some of these concerns can be addressed by manual 62,63 or automated genome curation and bin refinement approaches, 64 long-read sequencing technologies are already enabling the reconstruction of complete genomes without binning, and this will soon be possible at scale.Products of long-read sequencing can be error prone, and efforts to automate error correction and to improve the accuracy of long-reads are underway (e.g., Sacrista ´n-Horcajada et al., 65 Wick and Holt, 66 and Hackl et al. 67 ).9][70] However, assembled long-reads are not without issues, and the resulting sequences can suffer from false circularization due to artifactual repeated sequences and misassemblies that are not supported by the original reads.High-throughput methods for detection and correction of assembly errors will ultimately bring microbiology closer to the ideal of routinely rendering accurate and complete genomes and increase the utility of genomes from environments for downstream applications.
Expanding and exploring the protein universe Metagenomic sequencing can uncover new protein families, and whole proteome comparisons (including proteins with unknown functions) can establish relatedness among large groups of organisms, 71 including those that lack universal genes, such as extrachromosomal elements (ECEs). 72Despite the lack of recognizable functional domains, the sequences themselves can provide information about physicochemical properties of the protein (e.g., hydrophobic domains or positively charged regions that may be involved in nucleic acid binding) and uncover metal and ligand binding sites.A new generation of computational methods such as AlphaFold, 73 RoseTTAFold, 74 and ESMFold 75 enable prediction of the three-dimensional structures of proteins and thus can provide functional clues that are not apparent based on the sequence alone.Often, predictions leverage templates in the form of the structures of biochemically validated proteins accessed via large protein databases such as the Protein Data Bank (PDB). 76However, it is possible to improve these predictions by supplementing the underlying multi-sequence protein alignments using the ever-expanding database of amino acid sequences derived from metagenomic studies. 77In fact, using sequences independent of templates, it is possible to discover new protein folds and vastly increase access to threedimensional structures for proteins with no related solved structures or sequence similarity to known proteins. 78These approaches, in combination with ecological and other information, have the potential to advance understanding the protein inventories of enigmatic entities such as viruses 79 and aid in the development of biotechnology tools. 80Exciting new bilingual models that translate information between structural and sequence space 81 as well as computational tools that increase the accessibility of protein structure calculations 82 and searchability of existing structures 83,84 in large databases with millions of predictions 75 help locate variants of proteins of interest with potentially different size or kinetics.It may be possible to revolutionize drug development given methods that predict molecular interactions and discover small molecules or antibodies that bind to proteins identified as drug targets. 85Indeed, computational approaches continue to improve at a rapid pace to predict a broad range of biomolecular interactions 73 and the stoichiometry of large protein complexes. 86Altogether, the ongoing revolution in protein structure space and new opportunities to integrate structural insights with established 'omics data types and analysis strategies will undoubtedly influence every corner of microbiology.

Going beyond the tree of life-(Giant) viruses, (giant) plasmids, and other extrachromosomal entities
With their astronomical numbers and complexity, viruses represent a force of nature that is implicated in all aspects of life, starting from its very origins. 87,88The immense impact of viruses on life continues through metabolic reprogramming of their hosts, 89,90 biogeochemical and ecological processes they influence in complex habitats, 91 and even the modifications of the reproductive behavior of animals. 92ontemporary virus research spans from their characterization and organization into cohesive units to linking them to particular hosts or ecosystems and investigating their biotechnological applications.The emergence of computational tools to recognize viral sequences in metagenomic assemblies [93][94][95] resulted in the recovery of tens of thousands of new viral genomes from the human gut, 96 oceans, 97 and soils. 98The increasing access to metatranscriptomics further supports the recovery of RNA viruses 99 and enables spatiotemporal tracking of eukaryotic viruses of public health significance and their variants. 100Microbiologists were surprised when ''giant eukaryotic viruses,'' whose genome sizes surpass those of some bacteria, 101 were discovered. 102Genome-resolved metagenomics had a remarkable impact also on our understanding of the diversity and biogeography of giant viruses, 103 their complex metabolic capabilities, 104 host reprogramming, 105 and impact on host genome evolution. 106Despite the progress toward a global taxonomic framework for viruses, 107,108 the luxury to be represented in a universal tree of life continues to be limited to cellular organisms.However, at least at the level of individual studies, the integrated use of metagenomic assemblies or genome-resolved metagenomics, phylogenomics, and protein structure prediction offers promising insights, not only to achieve a more complete representation of viral clades but also fill the gaps between disconnected realms of viruses.For instance, in a recent study, Gaı ¨a et al. used phylogeny-guided genome-resolved metagenomics using DNA-dependent RNA polymerase subunits to describe a large clade of viruses, and by combining functional features and predicted protein structures, they were able to reveal a first association between giant viruses that infect unicellular marine eukaryotes and herpesviruses that infect mammals. 1093][114] They also force us to question the classical definitions of what is a bona fide living entity 115 with opposing views, 116 yet continuously elucidate the conceptual frameworks that can offer formal organizations of the realms of biology. 117lasmids, which were first defined by Joshua Lederberg in 1952 118 and have served as a workhorse of molecular biology and genetics ever since, 119 represent another exciting corner of microbiology.The significance of plasmids parallels that of viruses: plasmids also impact the ecology and evolution of their hosts, 120 often find themselves under the spotlight of public health concerns such as global antibiotic resistance 121 largely due to their contribution to the proliferation of such genetic factors, 122 encode genes that are associated with a wide range of metabolic activities, 123 and suffer from a similar exclusion from trees of life.It is conceivable that these similarities may simply be an echo of deeper evolutionary connections that are no longer present in contemporary sequence space.Indeed, recent characterizations of ''virus-plasmids'' that can act both as viruses and plasmids, 124 observations of plasmids that propagate through virus-like particles, 125 or phylogenetic analyses that attribute the emergence of DNA viruses to recombination events between RNA viruses and plasmids 126 reinforce the likely presence of deeper evolutionary links and hint that there is likely much more to discover at this junction.Also similar to viruses, the diversity of naturally occurring plasmids has been grossly underestimated.While metagenomic sequencing and assemblies access naturally occurring plasmids, recognizing plasmids in sequence collections remains a challenge.Some plasmids, even for some of the most rigorously studied microbial taxa such as Wolbachia, remained undetected for decades. 127But our ability to recognize plasmids is improving thanks to computational strategies that exploit k-mer profiles learned from reference plasmids, 128 conserved plasmid functions, 129 circularity of sequences in metagenomes, 130 or deep learning approaches that use combinations of these features. 131Today, we know that even for relatively well-studied ecosystems, plasmids found exclusively in metagenomes exceed the number of previously known plasmids multiple folds. 132An interesting insight comes from a recent study that uses 68,000 non-redundant plasmids that were identified from human gut metagenomes and shows that plasmid distribution patterns are not correlated with bacterial taxonomy, 123 which reinforces the critical importance of treating them as semi-independent entities that represent an adaptive force in microbial systems, 133 similar to viruses.Yet organizing plasmids into evolutionarily cohesive units is perhaps an even bigger challenge compared with their identification.Unlike viruses, where relatively stable genes within distinct viral realms can yield taxonomic frameworks to shed some light on evolutionary routes of individual viral genomes, 134 attempts to find cohesive units in diverse sets of plasmids have been largely limited to partitioning pairwise sequence similarity networks. 135,136A recent attempt to identify evolutionarily cohesive ''plasmid systems'' in such networks through a containmentaware partitioning approach demonstrated a means to identify plasmid backbones and cargo genes and partition plasmid genes into housekeeping genes and those that respond to particular environmental conditions de novo. 123Plasmids are a pillar of biopharmaceutical advances, 119 and naturally occurring plasmids represent a new asset.But the world of plasmids is chaotic: some can be so extremely essential and integrated into the host cellular physiology that they compelled researchers to call them ''chromids,'' 137 while others can look so small and irrelevant even though they can be some of the most numerous genetic elements in natural habitats. 138With their unruly nature, plasmids remain an intimidating and exciting dimension of microbiology.
A significant challenge that is generally true for most extrachromosomal elements that are not typically found integrated into host genomes is the difficulty of linking them to their host organisms, which can be a significant burden in complex environments.By preserving the unity of naturally occurring genetic elements within cells that are typically lost in bulk sequencing efforts, single-cell genomics approaches offer somewhat straightforward means to capture virus-host 139 and plasmidhost 140 associations with high confidence.In some cases, it is possible to leverage the fact that bacteria and archaea incorporate snippets of their extrachromosomal element genomes into their CRISPR loci (spacers) so that invasive elements can be recognized by the protein-RNA-based surveillance system.Extrachromosomal elements whose genomes match to CRISPR spacer sequences, especially from the same or related samples, are likely the targets of that microbial immune system, thus indicating ECE-host relationships. 141Because CRISPR loci are often very fast evolving, the effectiveness of this method can be improved when spacer sequences are recovered directly from short metagenomic reads rather than from the assembled consensus sequence. 141Another solution to identify hosts for extrachromosomal entities in complex samples comes from Hi-C, 142 a molecular strategy that can physically link genetic entities that are adjacent to one another prior to sequencing and can reveal associations between host genomes and extrachromosomal entities, including plasmids 143 and viruses. 144,145inally, heterogeneity in population genomic data can identify fast evolutionary modes and uncover the exact integration sites for prophages and proviruses, simultaneously establishing host association for these elements. 146,147verall, our insights into extrachromosomal elements are improving at a rapid pace and cover more than historically well-recognized ones such as viruses and plasmids.The recent description of ''Borgs,'' which can have over 1 Mbp linear genomes, contain large inventories of metabolically relevant genes, and are not simply classifiable as plasmids or viruses, 148 ''obelisks,'' which are 1 kbp viroid-like extrachromosomal circular RNAs that are prevalent in gut metagenomes and metatranscriptomes, 149 the recent expansion of ''viroid-like circular RNAs,'' 150 the ''phage-inducible chromosomal island-like elements'' that demonstrate the extent of parasitism among mobile genetic elements 151 and the overall prevalence of poorly characterized phage satellites across bacterial genomes 152 indicate that there is more to be learned about these mysterious entities.

Discovery and validation of new genetic codes
Metagenomic sequencing has brought to light unexpected phenomena that present new dimensions of modern microbiology.One of these relates to the genetic code and its variability.The genetic code was long considered immutable, and, indeed, it is widely conserved in bacteria and archaea.However, partial sequencing of a ribosomal gene cluster in the Mycoplasma capricolum genome revealed that, as in some mitochondrial genomes, TGA (normally a stop codon) is read as tryptophan (genetic code 4). 153Subsequently, metagenomics-based analyses indicated that Gracilibacteria and Absconditabacteria, sibling lineages within the CPR, have reassigned the TGA codon. 43Translation as glycine, 154 confirmed using metaproteomics, 155 ushered in a third bacterial genetic code (code 25).These code variations appear to have arisen in single, rare, and ancient evolutionary events.To date, no analogous genome-wide codon reassignments have been reported in archaea, although this domain of life remains under-explored relative to bacteria and eukaryotes.Recently, a new computational approach, ''Codetta,'' was developed to identify new instances of codon reassignment in the now vast genomic sequence databases. 156Analyses revealed the first evidence for changes in the amino acid encoded by sense codon changes in bacteria.In combination, these studies broaden and deepen our understanding of the modes and frequency of genetic code evolution in prokaryotes.
The origin of the relatively (but with exceptions) stable modern genetic code has been investigated recently by Zagrovi c and colleagues, who found that, although mRNAs and their cognate proteins are different types of polymers, mRNAs and the proteins they encode display complementarity. 157This is particularly evident if the molecules are unstructured (disordered), as would have been the case during evolution of the primordial genetic code.In other work potentially relevant to understanding of the origin of the genetic code, analysis of large sequence databases from across the tree of life and confirmed that frame-shifted proteins have similar physicochemical properties (e.g., hydrophobicity, nucleobase affinity, and intrinsic disorder) to the original, non-frame-shifted proteins, despite essentially no primary sequence similarity. 158It has long been known that similar codons encode amino acids with similar physicochemical properties and that the second codon position is most important for determining the properties of the amino acid encoded. 159hus, frame-shifted codons, which share two of three nucleotides, tend to incorporate related amino acids.Further insights into the evolution of genetic code come from integrated analyses of genomes and environmental selective pressures quantified through metagenomes.One such investigation by Shenhav and Zeevi shows that the genetic code optimizes codon assignments that minimize the impact of DNA mutations to dramatically change nutrient requirements of protein synthesis, revealing an overarching principle across the tree of life that enables relatively less risky exploration of the mutational landscape. 160ndings related to genetic codes of bacteriophages (phages) challenge the expectation that phages are unlikely to use a genetic code that differs from that of their hosts and the perception that genetic code changes are rare evolutionary events.In 2008, Shackelton and Holmes wrote ''... it is extremely unlikely that viruses of hosts utilizing the universal genetic code would emerge, via cross-species transmission, in hosts utilizing alternative codes, and vice versa.'' 161Consistent with this prediction, these authors confirmed that alternatively coded phages replicate in hosts that also use an alternative genetic code.However, Ivanova et al. reported that some phage genomes do in fact use a genetic code in which one of the three normal stop codons is read as an amino acid, yet they replicate in hosts that use the standard genetic code. 162Recently, the reverse was suggested: standard code phages can replicate in bacteria that use a modified genetic code, specifically Gracilibacteria and Absconditabacteria, whose genomes that encode TGA as glycine. 163Studies now show that code mismatch between phages and hosts is a relatively common phenomenon.For example, diverse alternatively coded phages replicate in Bacteroidetes and Firmicutes, bacteria that are particularly prevalent in human and animal microbiomes. 164Combined taxonomic and genetic code analysis of a metagenomically derived database of alternatively coded phages revealed that these alternatively coded phages can be closely related to phages that use the standard code.In the most surprising case, one phage lineage features standard code as well as TGA-reassigned and TAG-reassigned genomes.Thus, contrary to evidence from prokaryote genomes, results for phages suggest that the barrier to code change is relatively low, thus codon reassignment can arise independently in different lineages and over relatively short evolutionary timescales. 164ndications that code shifts in phages are relatively facile raise the questions of the evolutionary path to code change and the machinery required for different codes to be interpreted by the ribosome during phage infection.Acquisition of a suppressor transfer RNA (tRNA) that translates a normal stop codon as an amino acid could occur via lateral gene transfer or, more likely, via mutation of the anticodon of a tRNA (possibly preceded by duplication of the tRNA). 165Interestingly, to date, analyses of metagenome-derived phage genomes suggest charging of suppressor tRNAs by one of only three amino acids, suggesting convergent paths to these codon reassignments.The advantages of code mismatch remain unclear, but it may prevent premature production of structural proteins and proteins involved in host cell lysis. 164nother form of genetic code variation involves the reassignment of normal stop codons to non-standard amino acids.The first recognized example of this was the translation of TGA codon to selenocysteine in specific proteins, as signaled by a selenocysteine insertion sequence (SECIS) element, a specific structural mRNA feature. 166This can substantially modify the properties (e.g., stronger nucleophilic ability and lower reduction potential) of proteins relative to the cysteine-containing homologs. 167More recently, reassignment of the TAG stop to the 22 nd amino acid, pyrrolysine, was discovered and, to date, found to be prevalent in methylamine methyltransferases and a few other proteins. 168apping biogeography to genes and genomes Genomes have been reconstructed directly from metagenomes sampled from every major ecosystem, including microbiomes of humans, animals, insects, plants, soils, lakes, aquifers, and oceans.The large influx of genomes had an immediate impact on life sciences through broader access to predicted genes, functions, and metabolic potential of naturally occurring organisms.Surveys of traits that define the biogeographical patterns and niche partitioning of plants and animals go as far back as the eighteenth century in biology.However, a broad appreciation of the fact that microbes also display non-uniform biogeographical distribution patterns as a function of environmental gradients took time to come into focus. 169Some of the most defining work emerged from the studies of microbial mats in Yellowstone National Park, United States.Initially using denaturing gradient gel electrophoresis to separate 16S rRNA gene fragments amplified from the environment, Ward and colleagues showed the astonishingly clear patterns of microbial population biogeography within one-millimeter slices of Mushroom Spring cyanobacterial mats. 170These observations at the micron scale were confirmed by others at the scale of meters when Moore et al. showed that physiologically and phylogenetically distinct Prochlorococcus populations distribute across the water column as a function of light and temperature, 171 and across continents when Cho and Tiedje showed that genotypic differences between Pseudomonas populations correlated with geographic distances between soil samples from which they originated. 172With the recognition of the impact of microbial traits, their ecology, and impact on biogeochemical systems, 173 the next challenge was to go from phylogenetic indicators of biogeography to traits that drive biogeographical partitioning of microbes. 174The initial genecentric analyses of short metagenomic reads 175 and longer genomic fragments recovered directly from environmental samples 176 quickly revealed non-uniform distribution patterns for genetic features that were associated with environment-specific metabolic requirements.The ability to characterize biogeography of microbial populations along with their genes was a step toward one of the holy grails of microbiology with significant biotechnological and biomedical implications: elucidating genetic determinants of fitness encoded in genomes, which benefited from ecology-enabled comparative genomics approaches that soon followed with the emergence of pangenomics.

Characterizing genetic heterogeneity in microbial populations
The first comparison of complete genomes of two closely related Helicobacter pylori strains revealed a large extent of gene conservancy and systematic differences 177 and the existence of ''core'' and ''flexible'' gene pools. 178Research in the early 2000s took advantage of the increasing access to microbial genomes to reveal the utility of comparative genomics to gain insights into the likely determinants of microbial colonization and virulence, 179 or develop vaccines. 180The idea to systematically study differentially occurring genetic features of closely related microbial populations came into fruition with the recognition of the microbial pangenomes 181 and evolutionary processes that shape them. 182,183By partitioning the gene pool of a set of genomes, pangenomics offers insights into genes that are found in the vast majority of genomes as well as those that differentially occur between them, which is an immense power to link gene function to population ecology.
Genomes can tell very little about their biogeography and even less about how well their gene content represents the gene content of environmental populations to which they belong.Pangenomics combined with shotgun metagenomes can uncover not only the biogeography of individual populations but also gene conservancy across habitats. 184An application of this idea by Coleman and Chisholm to a Prochlorococcus pangenome and two marine metagenomes revealed that the scarcity of phosphorus in the North Atlantic Ocean resulted in a strict maintenance of accessory phosphorus acquisition genes for Prochlorococcus populations in the North Atlantic while their close relatives in the Pacific Ocean had largely lost them. 185This integrated approach to pangenomics and metagenomics to identify selective forces that shape gene variation across populations and environments, also termed as metapangenomics, 186 has revealed differential enrichment of near-identical Haemophilus parainfluenzae populations that are millimeters apart from each other in human oral cavity, where populations enriched in tongue commonly encoded sodium-dependent oxaloacetate decarboxylase enzymes and those enriched in dental plaque completely lacked them 187 and differential distribution of transporters and enzymes within marine SAR324 populations to balance autotrophic and heterotrophic lifestyles as a function of water depth. 188The metapangenomic approach can also be used to ensure critical genes validated experimentally in laboratory cultures were conserved in environmental populations of the same organism. 189he inclusion of ecological patterns into comparative genomics reduces the number of targets in our search for genes that may be particularly important under particular environmental conditions.However, difficulty with de novo prediction of gene function poses challenges.Efforts to shed light on the ecology of unknown genes, 190,191 profile hidden Markov model search algorithms, 192 application of deep learning to distant homology searches, 193 algorithms that enable rapid mining for protein structures, 83 and bringing in graph theory into pangenomes to infer synteny of genes 194,195 continue to improve insights into microbial gene pool.

Disentangling targets of long-term evolutionary processes
Even the genes that are shared across all members of a population can vary across individuals in subtle ways.Since the early descriptions of within-population trait variations in animals and plants by Darwin and Mendel, population genetics, the study of genetic variation among individual members of a population, has had a remarkable history in biology revealing molecular principles of evolution. 196Even with limited access to sequencing, it was becoming clear during the 1990s and early 2000s that microbial organisms were not strictly clonal 197 and that the microbial world was a playground for evolution, 198 a powerful phenomenon that enables microbes to dynamically respond to environmental change within seasonal timescales. 199Thus, elucidating the ways by which individuals of a single population differ from one anohter was essential to study fundamental principles of genetic innovation and divergence, 200 an ideal that gained remarkable traction with the increasing availability of genomes and metagenomes.
Metagenomic datasets capture genomic heterogeneity of closely related populations and provide the data needed to identify genes under neutral, adaptive, or purifying forces.Indeed, the initial culture-independent observations of fine-scale population structures through microdiversity patterns found in clone libraries from marine systems 201 foreshadowed the monumental impact of shotgun metagenomics to study genome-wide spatiotemporal patterns of evolution.Identifying targets of distinct evolutionary processes in complex systems is an intimidating task, given that even in the most well-controlled laboratory systems conceivable, such as the long-term evolution experiment that exceeds 75,000 generations for 12 Escherichia coli populations, 202 presents us with overwhelming complexity. 203Read recruitment from shotgun metagenomes using assembled contigs or reference genomes enables the characterization of genetic variants with access to exact allele frequencies at a single-nucleotide resolution. 204Early studies that took advantage of shotgun metagenomics to study population genetics in relatively simple microbial systems observed the outcomes of transposase activity and purifying selection 146 and quantified genetic variability patterns throughout entire genomes of environmental populations. 205In addition to increasing depth of metagenomic sequencing efforts, large-scale metagenomic surveys that yield public datasets with high-legacy 206,207 have been improving the application of microbial population genetics.More recent research efforts have elucidated variable rates of evolution for genes within individual microbial populations of the human gut, 208 patterns of gene-and genome-wide selective sweeps in lakes, 209 fate of subpopulations after human fecal microbiota transplantation procedures, 210 associations between microbial population structures and human biogeography, 211 locationspecific purifying forces in deep-sea hydrothermal vents, 212 the impact of large-scale oceanic current temperatures on population structures of marine organisms, 213 the impact of anatomy on population structures on human skin, 214 and more. 215,216any of the studies to date used subtle genetic variation in natural populations to infer to what extent divergence within microbial populations predicts or correlates with other environmental variables, such as differences in biogeography, various ecological measurements, or the health and disease states of organisms colonized by microbes.However, while such statistical inferences that treat every variant as equal are useful for broad ecological insights, they are not suitable to resolve the impact of individual variants on function or the evolutionary processes that maintain them.Increasing the utility of environmental surveys of genetic variation to advance biotechnological applications will undoubtedly benefit from consideration of protein structures, 217 given the sequence-structure-function paradigm 218 and our contemporary understanding of the direct relationships between protein sequence and function. 219The wellunderstood importance of protein structures to interpret genomic variants 220 was recognized in the context of population genetics over two decades ago. 221However, these ideals were delayed by challenges associated with predicting protein structures from amino acid sequences until recently. 222Especially with the introduction of a new generation of deep learning frame-works that enable accurate predictions of protein structures, careful interpretations of variants observed in metagenomes are becoming more accessible through integrative 'omics platforms 223 for structure-guided mining of metagenomes. 224A recent study by Kiefl et al. that linked protein structures and microbial population genetics demonstrated the non-uniform distribution of non-synonymous variants with respect to their relative solvent accessibility and distance to ligands and the rapidly decreasing rates of non-synonymous variants around ligand-binding sites of key metabolic genes as a function of bioavailable nitrogen concentrations in marine systems. 225In another study, Han and Peng et al. observed variable selective pressures across the genomic features of organohalidereducing taxa that occupy deep-sea cold seeps and demonstrated the presence of stronger purifying selective forces acting on buried residues of metabolically critical reductive dehalogenases in the environment. 226ucidating how microbes respond to rapid environmental change Environments change constantly, and survival depends on the ability to keep up.Of all organisms, microbes are unusual in that their evolutionary rates may be fast enough to respond to steady and relatively slow environmental shifts that take place in timescales of decades or longer.However, the conventional means of genome evolution is not suitable to respond to rapid changes that may occur within hours, days, or weeks.Thus, life has evolved many means to dynamically respond to ever shifting fitness landscapes, some of which are just coming to light.
Optimization of transcription and mechanisms behind transcriptional regulation is arguably one of the most well-understood means to respond to rapidly changing environmental conditions.Technological advances brought transcriptomics to fruition starting around the 1970s and continued at a rapid pace with the increasing access to sequencing technologies. 227The first whole-genome characterization of the transcriptomic landscape of a microbe demonstrated that Listeria monocytogenes had remarkable transcriptional changes as a function of the environment in which they occurred. 228With massive advances in singlecell transcriptomics, recent studies demonstrate responses such as spatial expression profiles in biofilms, 229 heterogeneous responses to antibiotics exposure, 230 and transcriptional responses to phage infection 231 at the level of individual cells across isogenic populations.Metatranscriptomics, the study of RNA molecules directly recovered from complex samples, was applied first to marine microbial populations 232 and had challenging beginnings due to the extremely low numbers of messenger RNAs per cell, short half-life of RNA molecules, lack of poly(A) tails that increase the signal-to-noise ratio in RNA libraries, and challenges associated with the quantification of RNA sequencing results. 233Nevertheless, contemporary largescale metatranscriptomics surveys have reached a level of maturity to track microbial responses to warming oceans 234 and elucidate disease-specific microbial responses in the human gut. 235hanging patterns of expression via transcriptional regulation is only one aspect of the genetic toolkit that rapidly optimizes microbial responses.Analyses of genomes and metagenomes revealed the pervasiveness of ''phase variation'' in bacteria through inversion of regions in intergenic DNA to prevent the expression of costly genes in environments that do not require them. 236Jiang and Hall et al. demonstrated that the same microbial populations that were engrafted to multiple individuals through fecal microbiota transplantation can acquire rapid and heritable changes in their promoter orientations within days, highlighting the speed by which microbes can optimize their gene use to respond to individual-specific selective forces without changing their gene content. 236Another intriguing mechanism that enables microbes to change the efficiency of their existing genes is implemented by diversity-generating retroelements (DGRs). 237DGRs are a family of genetic elements that generate hypervariability in target genes based on a template sequence where every adenosine is replaced with a random base prior to mutagenic homing thanks to a highly error-prone retron-type reverse transcriptase. 238DGRs, which can generate a remarkable number of sequence variants in a single population within a few generations, were first discovered in a phage that infects Bordetella, 239 yet they are widespread in bacteria and archaea, 240 and the analyses of metagenomes show that they target a wide variety of functions, including those that are involved in protein binding, carbohydrate binding, and cell adhesion. 241Within-population variants of DGR targets recovered from metagenomes can yield strong biogeographical partitioning, 242 suggesting that selective forces optimize the pool of random variants to match the fitness requirements imposed by conditions.Translational regulation is yet another layer of biology that enables cells from all domains of life to mount rapid responses to environmental change, this time without changing their genomes or patterns of transcription but by changing what is prioritized by the translational machinery.One of the essential components of the protein synthesis, tRNAs, play a significant role in this process through their relative abundances in the cell and chemical modifications that influence their interactions with the ribosome. 243,244Cells can actively change their tRNA relative abundances to switch their metabolic state and prioritize the translation of immediate needs to respond to environmental conditions. 245Phages and plasmids can also influence the tRNA pool as they carry large inventories of tRNAs 246 and tRNA modification enzymes 123 that are most likely driven by adaptive processes 247 that enable direct influence on translation. 248Characterization of tRNA transcripts and their chemical modifications to study the dynamics of translational regulation is well established for cell populations that belong to the same organism, yet expanding these surveys into naturally occurring complex habitats is complicated by immense molecular and analytical challenges.For instance, due to their rigid secondary structures and chemical modifications, highthroughput sequencing of tRNA transcripts has been notoriously difficult. 249But, improving molecular protocols promises broader access to tRNA sequencing 250 and new opportunities to study targets and implications of translational regulation also in complex systems.For instance, an application of tRNA sequencing to animal gut microbial populations revealed differences in taxon-specific tRNA chemical modification profiles as a function of diet, where highly modified tRNAs in high-fat diet condition resolved to codons that were highly en-riched in highly expressed proteins compared with low-fat diet condition, 251 and demonstrated the feasibility of this approach to track rapid changes in translational priorities and their genetic targets.Another promising approach to study translational responses of environmental microbial populations is ribosome profiling, a strategy that gives access to ribosomeprotected RNA fragments that are being translated at a given time point. 252An application of ribosome profiling to human gut microbial populations identified significant differences in transcripts and what was actually translated, 253 highlighting the need for highly resolved strategies that target critical components of translation to gain more complete insights into microbial responses to environmental conditions.
Insights into in situ microbial activity From the perspective of ecological studies that seek to understand in situ processes, it is important to distinguish active versus inactive microbial community members, an ideal that has become more accessible in recent years.For example, bioorthogonal non-canonical amino acid tagging (BONCAT) can visualize translationally active cells within complex environmental samples. 254Another innovative approach uses fluorescence in situ hybridization (FISH) and nanoscale secondary ion mass spectrometry (FISH-nanoSIMS) in combination with 15 N stable isotope probing to link metabolic activity of cells to their location in complex consortia. 255,256From the perspective of metagenomic studies, an important motivation for use of methods that can identify organisms that are active in situ is the recognition that a substantial fraction of DNA (e.g., in soil) is from dead organisms or is extracellular. 257This may, in part, be a reflection of the fact that soils are incredibly biodiverse and display rapid shifts in community composition over the seasons.A key approach to identifying the subset of organisms that are active at any time is to leverage isotopes, which are selectively incorporated into proteins 258 or DNA 259 of actively growing organisms.Typically, stable isotope amendments are applied to samples of interest shortly before extraction of proteins or DNA.In the case of proteins, labeled versus unlabeled proteins (e.g., containing 15 N amended in the form of ammonia) are distinguished using highly accurate mass spectrometry measurements of peptides and their fragmentation productions. 260In the case of DNA, fractions are separated by weight, and different fractions are sequenced independently.Implementation examples include the growth of plants in a 13 CO 2 atmosphere and tracking of 13 C into the rhizosphere microbiome or addition of isotopically heavy water that is tracked into organisms that incorporate oxygen into de novo synthesized DNA.In the original implementation, referred to as quantitative stable isotope probing (qSIP), the heavy fraction (after consideration of the effect of genome GC content) is linked to actively growing organisms. 259or example, one study used H 2 18 O addition and quantitative PCR of 16S rRNA genes to track growth and turnover of bacteria and fungi and thus quantified these contributions to the CO 2 flux from soil following the water addition to dry grassland soil. 261ubsequently, DNA labeling was tracked into whole metagenomes, enabling stable isotope-informed genome-resolved metagenomics. 262Despite the potential for artifacts when, for example, field-collected soil samples are necessarily incubated with heavy water in the laboratory, stable isotope labeling is emerging as an important method for the study of microbial ecosystem dynamics.
Emerging opportunities enabled by genome editing of communities Most current biological knowledge was generated via experiments in which a specific microbe was grown as an isolate (i.e., in a pure culture), and a certain gene was knocked out or added experimentally to determine its function, and much of it involved just one organism, E. coli.However, the tree of life is vast, and the suite of isolates available for traditional genetic manipulation is extremely limited.Even after functional assignments that leverage structural similarity, there are vast numbers of proteins for which functions cannot be predicted.Why do microbes grow as members of communities rather than evolve into multicellular organisms such as trees or insects?The benefits may relate to adaptability to changing environmental conditions and likely come from flexibility in community membership and efficiencies gained via cooperative activities.As a consequence, microbes often depend on other organisms for key resources and devote genetic potential to communication and competition.These facets of microbial activity are understandably understudied because the majority of genetic and biochemical research conducted to date has been performed on organisms growing alone.In the absence of new methods that enable genetic manipulations of organisms without isolation, this is an essentially insurmountable problem.Understanding of the genes and pathways via which microbes interact with each other (and macroorganisms) requires the ability to perform ''gold standard'' genetic manipulation without the requirement for isolation.In other words, the genome editing of microbes while they exist in the context of their communities.With this approach, not only genes involved in interaction can be interrogated but also organisms that cannot be grown in isolation can become tractable for experimental manipulation.
If one is to edit a microbiome using CRISPR-Cas methods, there are several key requirements.First, stable, reproducible laboratory-based microbiomes are needed for method development and testing.Next, it is vital that the sequence that is targeted for modification is known.For this reason, genome-resolved metagenomics is foundational.The third challenge is to find a means by which the genome editing tools can be delivered into the targeted cells, and this is likely to be microbe-specific (e.g., conjugation and natural competence).To address this challenge, Rubin and others developed environmental transformation sequencing (ET-seq) and a DNA-editing all-in-one RNA-guided CRISPR-Cas transposase (DART), to enable organism-and locus-specific genetic manipulation. 263Although direct editing of a human infant microbiome member was achieved, the editing efficiency was low and occurred in well-studied bacteria.Future delivery improvements are needed to target diverse organisms from little-studied branches of the tree of life before this potentially revolutionary approach can reach its potential.
Genetic manipulation in community context has been explored in the mouse microbiome, ultimately, with the goal of engineering communities in situ.The first step recovered engineered strains for reintroduction into the gut to alter overall com-munity function.To do this, Ronda et al. developed metagenomic alteration of gut microbiome by in situ conjugation (MAGIC). 264The method harnesses horizontal gene transfer to genetically modify gut bacteria.A vector introduces green fluorescent protein labeling to recipient cells that can then be cell sorted or introduces antibiotic resistance that can be selected for.Libraries included vectors bearing replication elements suitable for different gut bacteria.Among the challenges identified was long-term stability of the engineered genetic constructs, an issue that was somewhat addressed by isolation and engineering of host-derived strains that could mediate further transfer of engineered functions in the microbiome.
A recent study reported the important accomplishment of genetic editing of a population of episymbiotic Saccharibacteria in the context of a co-culture with host Actinobacteria. 265The experiments harnessed the natural competence of the Saccharibacteria to insert heterologous sequences and targeted gene deletions.This enabled identification of novel Saccharibacterial genes as important for the growth of these cell surface-attached symbionts on their Actinobacteria host cells.
Ultimately, the ability to genetically modify strains of interest in microbiome context will make it possible to assign functions to a vast spectrum of novel genes and enable precise tuning of microbiome composition and function.By microbiome manipulation, a vast range of desirable outcomes of societal and environmental importance may be achieved.

A SHIFTING MINDSET TO ADDRESS THE CHALLENGES OF OUR MODERN ERA
A hallmark of modern microbiology is the plethora of new kinds of data generated from inconceivably intricate microbial systems as a whole with brand-new molecular and computational strategies.Taking full advantage of our observational powers demands us to focus on how to improve many of the well-established practices in microbiology, from how we responsibly disseminate data for transparency and reproducibility, or how we train life scientists to better exploit new data streams and computational resources, to how we encourage, support, and promote team science efforts to solve our immense need for integration when we mostly credit the first name and/or the last name that appear in final publications.Apart from such tangible issues that can be solved through tangible actions, some aspects of modern microbiology can also benefit from discussions regarding how the classical determinants of good scientific practice apply to this new era in which our observational powers far exceed our ability to interpret the data they yield.For instance, how critical is it to lead contemporary research in microbiology with the expectation of testable hypotheses as the first and foremost requirement to benchmark or implement new ideas?And how should we recognize, prioritize, or promote the synthesis of research products from laboratory experiments or data-driven explorations?
We believe it is important to recognize that hypothesis-driven research is inevitably limited by the ability of the human mind to imagine facets of microbiology that may be far from what we know today.Indeed, from the discovery of penicillin to insulin, from radioactivity to X-rays, history of science has a long list of breakthroughs that emerged as unintended consequences of exploratory research and serendipity.For example, although the CRISPR-Cas system could theoretically be conceived of de novo, its accidental discovery through microbial genome sequencing provided a new window on virus-host interactions and ultimately opened the path for a revolution in biotechnology.Sequences from bacterial host CRISPR loci and coexisting viruses revealed infecting bacteriophage genomes as the source of the CRISPR spacer sequences, [266][267][268] but biochemical experiments were essential for uncovering mechanisms and thus enabling adaptation for targeted editing of eukaryotes, 269 with almost unimaginable impacts in medicine, agriculture, and basic science.This and other examples underline the opportunities that arise from exploratory research and its confluence with laboratory experimentation once the hypotheses reveal themselves from exploration.
In a world of unknowns, rigorous exploratory research provides the basis for hypothesis-driven research.Thus, given the huge scale and the microbial underpinnings of many of the problems of the modern era and the hope and potential for microbiome-based solutions, we advocate increasingly deploying the methods of microbiology in the context of ecosystems, whether they be the ocean, agricultural land, wetlands, municipal landfills, or others, and explore open-mindedly, and dare we say it, without asking for or testing any strict hypotheses.Only by developing a concrete understanding of how these ecosystems seem to work and only after conducting targeted, fieldscale experiments within them will it be possible to tackle questions of utmost importance for the wellbeing of us and our planet, such as ''how do we store more carbon in soil?,'' ''how do we cut trace gas emissions from rice paddies?,'' ''what can we do to reduce methane releases from cattle?,'' ''how do we improve the quality of marginal drinking water, accelerate the breakdown of ocean plastic, reduce salinization of agricultural soils, and use the Earth's subsurface to store CO 2 ?,'' ''how can we improve food biotechnologies to produce, for example, more palatable vegan meat alternatives?,''''how do manipulation of diet, probiotics, or environmental exposures alter development of human microbiomes?,'' and ''can changes in these factors address the ever-growing list of microbiome-associated diseases?''For example, given that around 40% of the land surface is already highly manipulated in support of food production, 270 agricultural systems are a logical target to learn how microbiome functionality can be modulated to reduce greenhouse gas emissions. 271et our ability to implement investigations of such magnitude in terrestrial, marine, or host-associated habitats requires a mindset that embraces natural complexity and remains open to experiments in such systems at large scale, with integrated analyses of readouts ranging from microbiome composition to biogeochemical fluxes to crop productivity or health status.
These considerations are particularly relevant to highly critical ecosystem-level questions such as ''how will future climate scenarios alter labile and recalcitrant dissolved organic carbon pools in warming oceans and impact marine food webs and the atmosphere?,'' ''how nitrous oxide emissions from grassland soil will be altered by shifts in rainfall amount and timing or temperature driven by climate change?,'' or ''how does soil recover after severe wildfires and how the patchiness of burned areas impacts the reestablishment of healthy soil microbiomes?''For questions of this magnitude, harnessing natural events and variation over temporal and geographical gradients may have bigger potential to improve our understanding of key processes and guide modeling efforts to predict ecosystem behavior compared with artificial or small-scale controlled disruptions to test hypotheses.In other words, sometimes there are reasons to let nature do the experiment.
Our review highlights the motivation, progress, and potential of the data-driven epoch of microbiology to tackle systems of great complexity with vast scales of unknowns.A critical admission in our search for solutions to the most pressing needs of our planet is that the complexity of microbial systems that modern microbiology aims to elucidate may not always be compatible with the desire for hypothesis-driven research.Thus, from the level of PhD committees to funding agencies, a strict requirement for ''hypotheses'' should not be a critical determinant of the fate of research proposals.While we should not hold data products to a lesser degree of scrutiny, we also should not trivialize data-driven investigations at the level of grant panels or editorial decisions.In the quest to understand how nature works, it is important to embrace complexity while looking for means to reveal its driving principles.