The Who, Why, and How of Small-Molecule Production in Invertebrate Microbiomes: Basic Insights Fueling Drug Discovery

Bacteria have supplied us with many bioactive molecules for use in medicine and agriculture. However, rates of discovery have decreased as the biosynthetic capacity of the culturable biosphere has been continuously mined for many decades.

N ature is an accomplished synthetic chemist, and a large fraction of bioactive molecules used today in medicine and agriculture are either evolved small molecules or were inspired by such agents (1). Natural selection favors the generation of compounds that improve the odds of survival, and these compounds can also be therapeutically useful for humankind if their mechanism of action impacts disease mechanisms. For example, many bacteria produce molecules that inhibit the growth of rival species or fungi, and we use many of these as antibacterial or antifungal treatments. Likewise, some plants produce toxic compounds that protect them from grazing animals, and many such compounds (for example, paclitaxel [originally named taxol]) are now used as cancer therapeutics. However, evolution also works against us-the widespread use of antibiotics in human medicine and agriculture selects for the propagation of resistance genes (2), some of which evolved long before antibiotics were used by humans (3), to confer self-resistance on antibiotic-producing organisms. In the case of antibiotics, recent decades have seen a precipitous drop in discovery rates (4), as soil-derived culturable microorganisms and synthetic chemistry programs have not yielded the number of drug leads originally envisioned. If we are to discover more drugs from nature, it would be wise to explore novel environments and parts of the tree of life that have been undersampled and to gain a greater understanding of the evolutionary and ecological forces that favor bioactive small-molecule production.
My research group and others have been exploring the biosynthetic potential of the as-yet-uncultured biosphere, using culture-independent sequencing techniques such as metagenomics and metatranscriptomics. Metagenomics and other systems biology methods have started to illuminate the true scope of microbial biodiversity on Earth (5, 6). The biosynthetic pathways that produce small molecules are widely distrib-uted in bacteria (7), and they are thought to mediate complex interactions in nature, known as the "parvome" (8,9). Although parts of the parvome-for example, quorum-sensing systems-have been studied, we currently lack a systematic understanding of chemical interactions in complex microbial communities. This stems from an inability to describe microbiome behavior at the level of individual species or strains-in other words, who is doing what, and why? Metatranscriptomics can be used fairly easily to determine gene expression trends in aggregate, but without knowing which species each transcript belongs to, changes in species abundance cannot be distinguished from expression changes. Increasingly, it is understood that genomes vary among environmental bacteria, and the complete set of genetic capabilities exhibited by all strains in a species can be considered the "pan-genome" (10). Accordingly, we have begun to examine transcriptome sequencing (RNA-seq) and metagenomics data from the same environmental sample, to allow the de novo assembly of novel genomes and to avoid problems with strain variability when aligning RNA-seq reads to DNA contigs (11,12).
In such matched DNA and RNA data sets, the accurate assignment of metagenomic contigs to species-level "bins" allows transcript expression to be quantified relative to housekeeping genes in the same genome, normalizing for changes in genome copy number between samples (Fig. 1). We are currently using these techniques to study the  behavior of the marine sponge microbiomes in response to dysbiosis. Sponges can have highly complex microbiomes containing hundreds of microbial species that often include highly divergent, novel species, making binning challenging. Semimanual methods of binning are too labor-intensive in these systems, and many of the automatic methods fail because they do not separate the host sponge genome. Other methods rely on coassembly of many samples, but the quality of coassemblies is degraded by interstrain variability between samples. Vertically transmitted symbionts are expected to exhibit sequence drift in different hosts (see below), and so coassembly of pooled samples can result in highly fragmented and chimeric contigs. We therefore have developed our own binning pipeline (26) so that highly complex host-associated metagenomes can be automatically and reproducibly analyzed. With accurate binning, combined DNA and RNA sequencing can be used to follow expression patterns of each microbe in a microbiome, and behaviors can be compared under different conditions. Such studies may well shed light on the environmental stimuli that initiate smallmolecule synthesis in the environment. We will, however, probably require new analysis and modeling techniques to truly understand the higher-order interactions and emergent behavior of whole microbiomes.
In the absence of a systematic understanding of microbiome function, my own research group has focused on systems where there is a clear ecological rationale for chemical defense. In particular, we have investigated several marine invertebrates that are sessile and/or lack physical defenses against predation and are known to harbor cytotoxic molecules, often made by a microbial symbiont rather than the host. The existence of such symbiotic relationships based on smallmolecule production implies that the small molecule has served a useful ecological function over evolutionary timescales. For example, we found evidence that the biosynthetic pathway for the patellazoles, picomolar cytotoxins isolated from the tunicate Lissoclinum patella, has been present in the genome of the producing symbiont for at least 6 million years (13,14).
It is our view that the most important bioactive compounds will be found in such ecological niches where they have been honed by strong selective pressures for prolonged periods of time. However, the symbiotic environment also conspires to make bacterial symbionts difficult to culture. While selection pressure to maintain biosynthetic capability for protective or defensive small molecules is strong in symbionts (13,15), pressure to maintain basic metabolic functions needed for independent growth is weakened because of the hospitable and stable host environment (16). Over evolutionary timescales, this altered selection profile and a population structure where small numbers of symbiont cells are isolated in one host individual lead to the progressive degradation of gene sequences until they become nonfunctional pseudogenes and are eventually deleted (16). After a prolonged period of time, this "genome reduction" process yields very tiny genomes (~Ͻ500 kbp) that cannot support life outside the host. We have therefore used shotgun metagenomics extensively to gain insight into the life of symbiotic bacteria that make small molecules.
We recently used metagenomics to describe the genome of a bacterial symbiont in the phylum Verrucomicrobia that exemplifies this dichotomy between strong selection for secondary metabolites and weak selection for more basic functions (15). "Candidatus Didemnitutus mandela" lives within a marine tunicate and produces cytotoxic compounds called mandelalides (17). Its genome contains relatively few full-length genes with recognizable functions, and most of the genome is littered with either short hypothetical genes of unknown purpose or truncated forms of homologs in the closest known relative ("pseudogenes"). Despite these clear signs of genome reduction, the mnd pathway for the production of mandelalides is repeated seven times in the chromosome, collectively accounting for almost 20% of its total length. This likely indicates pressure for greater production through increased gene dosage. After symbionts are restricted to living within their host, they become genetically isolated and subject to extreme population bottlenecks when only a few bacterial cells are passed vertically to the host's offspring. In this setting, mutations accumulate because they cannot be corrected by horizontal transfer among a large population, eventually leading to the loss of genes not immediately required for the symbiosis, including DNA repair pathways. "Ca. Didemnitutus mandela" has lost the ability to carry out homologous recombination, and consequently, a number of single nucleotide polymorphisms (SNPs) and deletions in some of the mnd repeats have become fixed through population bottlenecks and cannot be corrected. This process of degradation is likely to continue until only one copy of each mnd gene remains. Complete loss of the pathway is unlikely because hosts devoid of symbiont protection would lose their selective advantage. Many symbionts have been found to possess biosynthetic pathways that are fragmented throughout the genome, in contrast to the contiguous gene "clusters" found in free-living bacteria (18). This fragmentation could have arisen after early duplication events, as in "Ca. Didemnitutus mandela," followed by progressive degradation of pathway genes until each occurred as single copies originating from repeats in different locations ( Fig. 2A).
Importantly, it is not always obvious why particular bacterial symbionts are intractable to laboratory culture. For example, we recently sequenced the genome of "Candidatus Endobugula sertula," a symbiont of the bryozoan Bugula neritina that produces defensive compounds called bryostatins (19). Bryostatins are potent protein kinase C activators that have been evaluated in many clinical trials for cancer and HIV infection, but the isolation of 18 g of bryostatin 1 requires the collection of 10,000 gal of Bugula neritina (20). Despite many attempts, "Ca. Endobugula sertula" has never been cultured. However, the genome of this symbiont does not show signs of ongoing genome reduction, and "Ca. Endobugula sertula" appears to be a recent symbiont that retains capability for horizontal transmission between hosts (Fig. 2B) (11). Many other promising compounds are made by uncultured microbes, such as anticancer drug ET-743 (21), and all suffer from similar "supply problems" unless a cultured source can be identified or a synthetic route devised. In the case of bryostatins, a scalable synthesis has only recently been developed 35 years after the compounds were discovered (22). Bryostatins could be recollected or synthesized in amounts justified by initial biological findings, but rarer, and potentially even more clinically significant, agents are unlikely to be developed to this extent.
Heterologous expression of pathways might offer an alternate means of supplying novel compounds from unculturable sources, but this work is far from trivial. Thus far, such efforts have been limited to hosts that are presumably related to the producer (23) or to relatively short pathways (24). Expression of highly complex pathways, such as polyketide synthase (PKS) systems with lengths up to~100 kbp and proteins up to~1 MDa, will require extensive refactoring and codon optimization. As the price of gene synthesis continues to decrease, synthetic biology methods could be employed to reconstitute such pathways. To be broadly useful, synthetic biology efforts should focus on determining design rules to ensure efficient transcription, translation, and folding of large protein components in arbitrary hosts. This is currently challenging because the causes of failure in heterologous expression experiments are not well defined, and we lack methods to diagnose problems; this is especially true for large modular proteins with multiple enzymatic domains (such as in PKS pathways). In my view, these fundamental knowledge gaps represent a major roadblock in using metagenomics for drug discovery and development programs. In the coming years, my research group will be focused on solving this roadblock, using synthetic biology to establish a rational "design-build-test" loop to both identify problems in transcription, translation, and folding and determine rules for the de novo design of functional versions of PKS genes. My ultimate aim is to allow the seamless use of metagenomic sequencing information for the functional expression of complex pathways in heterologous hosts, thus removing the limit of "unculturability" from drug discovery in the near future.   FIG 2 (A) Model for the transition from biosynthetic pathway duplication shortly after establishment of symbiosis to pathway fragmentation frequently observed in older symbionts (18). Early in the symbiosis, selection pressure for increased compound production could lead to pathway duplication. However, loss of DNA repair pathways and facile fixation of mutations due to frequent population bottlenecks give rise to sequence drift and proliferation of pseudogenes. All but lethal mutations accumulate, lowering the gene dosage of each repeated gene in the pathway. Eventually, one copy of each pathway gene will remain because further loss would impact the survival of the host. The remaining copies will not necessarily originate from the same repeat, leaving a single fragmented pathway. (B) Bioactive, and presumably defensive, compounds are produced by symbionts on a continuous spectrum of genome reduction, including the bryostatins (19), mandelalides (15), patellazoles (13), and diaphorin (25). In the early stages of genome reduction, coding density decreases, and at least in the case of "Ca. Didemnitutus mandela," biosynthetic gene cluster copy number increases. Intergenic sequences are progressively deleted, as more and more functional genes are also degraded and deleted, until symbionts possess dense, tiny genomes. IC 50 , 50% inhibitory concentration.