Introduction

Recently, with a new Editor-in-Chief and expanded Editorial Board, a set of changes to Journal of Molecular Evolution has begun (Liberles 2019). The journal was founded by Emile Zuckerkandl and has a long history of studies in a diverse array of topics from phylogenetics to protein evolution, origin of life, and evolution of the genetic code. Over the history of the journal, each Editor-in-Chief and appointed editorial board has emphasized different areas while also maintaining current publication trajectories in chemical and abiotic evolution.

The Journal of Molecular Evolution now aims to broaden its reach into evolutionary genomics while also recapturing the tradition of publication in molecular phylogenetics, modeling, and theory. With this in mind, new editorial board members were invited to highlight particular areas of molecular evolution that they find particularly compelling. While this effort is not meant to be systematic, or exclusive of areas not discussed, it is meant as an indication of scientific directions that members of the editorial board see as novel and emerging sub-disciplines. Further, while this editorial is not a call for manuscripts, the hope is that this communication will establish Journal of Molecular Evolution as a home for such research areas. This view from a collection of our editors is ultimately meant to spur discussion about the field of molecular evolutionary biology as a whole.

Prebiotic Evolution and the RNA World (Bottom up)

The Journal of Molecular Evolution has been a traditional venue for publications on the origin of life. Specifically, the journal published foundational studies that contribute to our understanding of a potential RNA world. The RNA world hypothesis in its simplest form states that life evolved from a replicating system of RNAs that served both as genetic carriers of heritable information and as the functional molecules encoded by those genetic carriers (Gilbert 1986). Though the functional range of natural RNAs is narrow, especially with respect to catalysis, in vitro selection studies produced catalysts that increase the plausibility that an RNA world scenario preceded cellular life (e.g., Lohse and Szostak 1996; Ekland and Bartel 1996; Lau and Unrau 2009).

In vitro selection of nucleic acids has not only yielded RNA molecules important for multiple applications that will be discussed below (Filonov et al. 2014; Svensen and Jaffrey 2016; Autour et al. 2016, 2018), but demonstrated what plausible RNA catalyzed RNA or DNA polymerization might have looked like in an RNA world scenario (Horning and Joyce 2016; Samanta and Joyce 2017; Attwater et al. 2018). At the same time, non-enzymatic RNA polymerization (Prywes et al. 2016; Zhang et al. 2017; Hänle and Richert 2018), and the role of crowding (Saha et al. 2018) and encapsulation (Bansho et al. 2016; Matsumura et al. 2016) are also becoming increasingly important factors for understanding plausible scenarios for chemical evolution and RNA replication at the origins of life. Furthermore, as high-throughput sequencing continues to fall in cost, RNA is re-emerging as an experimental model to explore evolutionary concepts such as the fitness landscape and epistasis (Pressman et al. 2017, 2019; Bendixsen et al. 2017).

Recent attempts to reconcile the RNA world concept with other considerations regarding the origin of life have produced a much more complex view of this potential stage in early evolution. Particularly, experimental studies have shown that under certain early Earth conditions, new catalytic RNA functions can be discovered, while the efficiency of known RNA catalysts can be enhanced (Hsiao et al. 2013; Popović et al. 2015). From a theoretical perspective, some have argued that any RNA world metabolism would have relied on prebiotic organic compounds produced by the geochemical setting of life’s origin (Goldman et al. 2016) and most dramatically, an RNA world may have co-evolved with prebiotic peptides and a rudimentary translation system (di Giulio 1997; Bowman et al. 2015; but also see Poole et al. 2015). This more complex and nuanced view of a potential RNA world presents an important challenge for future experimental and theoretical work on early evolution.

Early Evolutionary History (Top Down)

The ever-growing understanding of abiotic organic chemistry and synthetic evolutionary biology described above can be a powerful tool for understanding the origin of life because it affords researchers the ability to test a broad range of potential origin of life scenarios. But it is also ahistoric insofar as it can yield insight into how life may have originated, but not how it did in fact originate from a historical perspective (Pross and Pascal 2013). A parallel approach uses phylogenetic analyses of modern genes, genomes, and proteomes across the tree of life to understand early evolution from a historical perspective. One significant target of early evolution studies is the most recent common ancestor of all extant organisms, usually referred to as the Last Universal Common Ancestor (LUCA) (Becerra et al. 2007; Goldman et al. 2013).

Ever since genomes became available over a sufficiently representative taxonomic range, researchers sought to identify gene families, protein families, protein domains, and protein structures that may have originated at or before the time of the LUCA (e.g., Harris et al. 2003; Mirkin et al. 2003; Delaye et al. 2005; Yang et al. 2005; Ranea et al. 2006; Wang et al. 2007; Weiss et al. 2016). Though the results of these studies sometimes disagree in their particulars (Becerra et al. 2007; Goldman et al. 2013), they portray a LUCA that had a complete translation system similar to those we see in extant organisms (Harris et al. 2003; Goldman et al. 2010; Fournier et al. 2011) and a complex metabolic networks composed of protein enzymes (Braakman and Smith, 2012; Goldman et al. 2012, 2016; Weiss et al. 2016). LUCA also likely had a DNA genome (Forterre 2002; Goldman and Landweber 2012; Poole et al. 2014) and cell membrane (Martin and Russell 2003; Peretó et al. 2004), although these features are less certain since many proteins that support the DNA genome are not homologous between Bacteria and Archaea. Further, archaeal phospholipids have a different structure than bacterial and eukaryotic phospholipids. Even so, LUCA appears to represent a population of organisms that may have had a level of molecular and physiological complexity not too different from some modern organisms. Why we do not see a branch on the universal tree until life had evolved to such a high degree of complexity remains an important and open question.

LUCA was the last common ancestor of all organisms, but not the last common ancestor of all genes. The number of independent gene inventions giving rise to extant genes that predated the first DNA genome remains an open question. A small number of known gene duplications that took place prior to the last universal common ancestor can give some insight into evolutionary history before the time of LUCA. These universal paralogs were originally used to root the tree of life. The tree of life has no species outgroup, but because each paralog makes its own gene tree that resembles the tree of life, the other paralog can be used to root it (Gogarten and Taiz 1992; Gribaldo and Cammarano 1998). More recently, these universal paralogs have been used to understand evolutionary transitions prior to LUCA. For example, the final steps in the expansion of the canonical genetic code were elucidated by performing ancestral sequence reconstruction on universally paralogous families of aminoacyl-tRNA-synthetase enzymes (Fournier et al. 2011; Fournier and Alm 2015). Molecular evolution prior to LUCA is a burgeoning field that represents a cutting edge in the study of early evolution (Wolf and Koonin 2007).

The pairing of ancestral sequence reconstruction with molecular laboratory techniques has become another powerful tool in understanding early evolutionary history because it allows researchers to study proposed resurrected ancient proteins in the laboratory (Chang and Donoghue 2000). Early examples of this approach resurrected possible translation elongation factor protein, EF-Tu, from the bacterial ancestor to infer that they functioned at an optimal temperature of 55–65 °C (Gaucher et al. 2003, 2008). The same approach was more recently used to infer the evolutionary stability of protein structure within a thioredoxin family from the bacterial, archaeal, and archaeal-eukaryotic common ancestors to the present (Ingles-Prieto et al. 2013). It can also be used to suggest aspects of the ecology and physiology of animals long vanished from the earth, as with investigations of nocturnality in early mammalian lineages (Bickelmann et al. 2015; Liu et al. 2019). The field of ancestral protein resurrection has been further enhanced by the ability to replace a protein with ancestral versions within a living cell (Kacar and Gaucher 2012; Kacar et al. 2017a, b). The transformation of cells with genes encoding the putative ancestral versions of proteins promises to shed further light on the nature of molecular functions encoded in the genomes of early organisms including the last universal common ancestor, and recently, to resurrect ancient biogeochemical signatures (Kacar et al. 2017c; Garcia and Kacar 2019). The evolutionary transitions that occurred by the time the last universal common ancestor emerged include some of the most consequential in all of evolutionary history, shaping the internal structure and physiology of all organisms (Becerra et al. 2007; Goldman et al. 2013), and making life capable of speciation and ecological dispersal (Cantine and Fournier 2018).

Evolution of Genes and Proteins

From the evolution of LUCA to the evolution of extant cellular (and viral) genomes, phylogenetic pipelines have been established to understand gene relationships, selection, and protein functional evolution (Anisimova and Liberles 2012; Anisimova et al. 2013). Tests for selection (Kosiol and Anisimova 2019) and the relationship between protein structure and function over evolutionary time (Liberles et al. 2012; Chi and Liberles 2016) have recently been reviewed elsewhere. Understanding the importance of structural constraint in dictating sequence constraint has been a focus in protein evolution (see for example, Grahnen et al. 2011). Early stage models typically treated folding as a global property, but a more localized view of folding stability constraints may dramatically change our understanding. Further, the excess amino acid changes due to positive selection and the reduction of amino acid substitution due to clearly defined folding and functional interactions are not well understood mechanistically. In this context, the missing amino acid substitution due to “negative design” associated with both folding and binding amino acids that would fit within a structure but lead to a fold transition by enabling a more energetically favorable conformation to emerge if substituted in the folding sense (Noivirt-Brik et al. 2009) can not easily be detected. From the perspective of inter-molecular binding specificity, this would result in selective pressures to not bind to potential binding partners where the interaction would be deleterious with amino acid substitutions that would still enable a favorable interaction with the native partner (Liberles et al. 2011; Yang et al. 2012). Such missing substitution due to the “negative design” side of folding and binding specificity can probably be estimated statistically, but identifying the cause of it is a more daunting challenge, especially for current computational methods.

Protein structure is an intermediate between the genotype and the phenotype (function) of a protein. However, protein structure appears to be more highly conserved than either protein coding sequence or protein function. For example, when amino acid sequence divergence is compared to structure divergence between the same sets of proteins, a considerable amount of sequence difference is usually required to produce any appreciable difference in structure (Chothia and Lesk 1986; Illergård et al. 2009). Furthermore, families of proteins that share a common structure often evolve a range of different functions (Furnham et al. 2012). One explanation for the high level of conservation observed in protein structures as compared to protein sequence or protein function is that, there are a limited number of stable and biologically useful protein folds and that these are hard to discover through evolutionary processes. Correspondingly, many sequences can yield such folds, which can in turn be harnessed to perform many different chemical interactions and transformations. This many to few to many relationship is an important part of the genotype–phenotype map.

One mode of understanding the link between genotype and phenotype is through evolutionary synthetic biology and experimental evolution. Methodological advances, including deep mutational scanning, have combined sequence data and modeling to better understand the rules of evolutionary processes (see for example Doud et al. 2015). This new understanding ultimately can lead us back to computational biology and predicting new genotype–phenotype relationships. While traditional models of statistical genetics are designed to fit data without the ability to extrapolate, more mechanistic models may have this potential and are a growing area, integrating across layers of biological organization (see for example, Loewe 2016; Lind et al. 2019). This will be described below in more detail.

How Basic Properties of Cells Influence Molecular Evolution

Toward the aim of integrating molecular biology with evolutionary biology, the past five years have seen growing enthusiasm for the idea that the structure and function of basic molecular building blocks (e.g., genomes, proteins, regulatory networks and cells) have a profound influence on evolutionary processes. For example, several studies show how the requirement for globular proteins and RNAs to fold into three-dimensional structures can limit the evolutionary trajectories by which they access new functions or optimize existing ones (Canale et al. 2018; Kurahashi et al. 2018; Pressman et al. 2019). Other studies demonstrate how physical constraints on cell size (Farhadifar et al. 2015) or energetic constraints on cell metabolism (Scott et al. 2014) lead to potentially generalizable ‘scaling laws’ that may have pervasive effects on the evolution of diverse organisms. Preceding this recent enthusiasm is a long history of studies focusing on how generic features of cell systems can drive or constrain evolutionary processes, with important parts of that history unfolding in the Journal of Molecular Evolution, as reviewed below.

Early studies found puzzling patterns in the evolutionary rates of different nucleotides or genes, leading to discoveries about how these patterns are generated by the way replication machinery, translation machinery and other cellular machines operate (Crick 1966; J Mol Biol; Mazin 1976; Kimura 1980; Sharp and Li 1986; Drummond and Wilke 2008; Shahmoradi et al. 2014). For example, the observation that nucleotides in the third codon position vary more than others is driven by the fact that binding of the cognate tRNA is looser in that codon position (Crick 1966). Others observed and searched for mechanistic explanations as to why some codons are used more than others to specify particular amino acids (Elton et al. 1976; Berger 1978), again discovering intriguing patterns that could not be understood without considering basic properties of cell systems. Eventually, the observation that highly expressed genes are more biased in their codon usage (Bennetzen and Hall 1982; Sharp and Li 1986) was made clearer by understanding the costs cells encounter when producing highly abundant proteins (Drummond and Wilke 2008). Other patterns of codon bias, including those that distinguish tissue-specific genes, for example, remain to be fully understood (Supek 2016).

A few pivotal papers published in the Journal of Molecular Evolution transformed diverse observations into general hypotheses about how generic features of cellular systems influence the way evolution unfolds (Zuckerkandl 1997; Stoltzfus 1999). One prominent hypothesis that emerged relates to how ubiquitous errors in DNA replication and transmission can create redundancies (e.g., duplicate genes or duplicate pathways) that promote complexity, innovation, and diversity (Stoltzfus 1999; Force et al. 1999).

A second influential hypothesis asserts that the mere fact that genes and proteins physically interact inside of cells can also promote complexity and innovation (Stoltzfus 1999; Zuckerkandl 1997). For example, a protein complex may expand because a mutation that destabilizes a necessary interaction can be compensated by recruitment of another protein that re-stabilizes the complex (Jarvis et al. 1989; Zuckerkandl 1994, 1997). A recent high-throughput study confirms that complexity (e.g., the number of proteins in a complex) can increase through processes driven by physical interactions between proteins (Diss et al. 2017). A related hypothesis pertains to the idea that interactions among mutations can open or close evolutionary doors (Zuckerkandl 1997). For example, studies of protein and tRNA reveal how mutations that destabilize folding are counterbalanced by those that stabilize it, resulting in entrenchment of some mutations (i.e., they are no longer reversible) as well as the possibility of previously forbidden mutations (Huynen 1996). Study of this topic has recently exploded in large part due to new technologies that allow generation and analysis of many mutants (Shah et al. 2015; Starr et al. 2018; Otwinowski et al. 2018; Kurahashi et al. 2018).

In summary, recent work focusing on how the structure and function of molecular building blocks influences evolutionary outcomes stems from a rich history of studies. This enthusiasm has been further fueled by influential review papers (Zuckerkandl 1997; Stoltzfus 1999) and most recently by reviews urging deeper consideration of how higher-level cellular features that have historically received little attention (e.g., organelle structure, energetic costs of metabolism) impact evolutionary processes (Lynch et al. 2014; Phillips and Bowerman 2015; Titus and Goodson 2018). Modern high-throughput phenotyping and genome-editing techniques including DNA barcoding, CRISPR, single-cell microscopy, and RNA-seq have vastly improved our ability to investigate molecular-level features of cells (Kinney and McCandlish 2019), thus enabling more comprehensive investigations of how these features influence molecular evolution. The Journal of Molecular Evolution is committed to continuing its tradition of publishing articles in this area and encourages such submissions.

Genome Evolution

Lineage-specific genome content and architecture are shaped by a collection of population genetic and life history traits. Lynch (2007, 2008) identified effective population size as a modulator of the effectiveness of selection as a key parameter in driving differences in gene number and content, as well as in genome structure across species. It has become clear that the nature of the genotype–phenotype map gives rise to many genotypic solutions to a given phenotypic outcome and this has emerged as an important feature of the evolutionary landscape at the genomic level as well. From the genome of the tunicate, Oikopleura dioica, to the nature of gene function in glycolysis, there are many examples of surprising variability in genotypic structure (Denoeud et al. 2010; Orlenko et al. 2016). As the catalog of whole-genome sequences grows across the tree of life from phenotypically diverse species, making genotype–phenotype connections will become more commonplace and comparatively powerful. For example, whole-genome comparisons between species with regressive morphologies (e.g., naked mole rats, cetaceans) show that they carry large suites of inactivating mutations that provide genetic signatures revealing the regulatory architecture of complex, adaptive transitions to new life histories (Huelsmann et al. 2019), with many of these phenotypes serving as naturally occurring mimics of human disease (Emerling et al. 2017). Because of the complexity of the genotype–phenotype space, it is a natural extension that at a sequence level, genomic observations are far from a sampling of an evolutionary equilibrium and additional new mappings are expected to be identified as data continues to accumulate (see for example Povolotskaya and Kondrashov 2010).

In the last decade, it has become apparent that genomes harbor numerous signatures of discordant genealogies (Bravo et al. 2019). This variation is due to complex interactions between natural selection, hybridization, recombination, and effective population size (Hobolth et al. 2007, 2011; Schumer et al. 2018; Martin et al. 2019; Li et al. 2019). We are only in the infancy of discovering the full variation encrypted within the genomes of living organisms, and require new methods to analyze whole-genome data in the context of unique local genomic architectures and modes of genetic transmission. New approaches that consider the phylogenomic structuring of gene histories along and between chromosomes, and their interaction with recombination rates, natural selection, and demography, will be useful for reliably inferring phylogenetic histories and the role of gene flow in obscuring ancient phylogenetic structure.

New long-read sequencing technologies are finally beginning to open up the “dark matter” of the genome, allowing sequencing of long, repetitive gene families and satellite repeats that were not previously possible. Many of these repetitive elements are known to play roles in disease susceptibility and a variety of other phenotypes, so having complete telomere–telomere sequences (e.g., Miga et al. 2019) will provide unparalleled opportunities for comparative genomics and making genotype–phenotype correlations in non-model organisms. One area that will benefit greatly in this regard is the analysis of gene family evolution. Numerous studies in the literature make biological inferences about adaptation from gene loss and gain events in large multicopy gene families. However, gene counts of segmentally duplicated regions in draft genome assemblies are prone to error and incomplete gene models lead to erroneous biological inferences (Denton et al. 2014). Improved genome assemblies, such as through novel trio-binning approaches (Koren et al. 2018; Rice et al. 2019), will push the field forward so that we can better connect copy number evolution changes to phenotypic innovations. Models for understanding these evolutionary dynamics in a tree reconciliation framework are also an important direction (Konrad et al. 2011; Yohe et al. 2019).

In functionally annotating genomes, enrichment analysis using GO terms or the KEGG Database of pathways have become common. The next level of analysis will involve more computational assessment of gene functions. Clustering of positively selected, differentially expressed genes, or retained duplicates in a pathway or functional category can happen for different reasons. Many studies lack a phylogenetic null model that considers mutational opportunity or the notion that mutation can itself be biased. Further, compensatory covariation (epistasis that is evolutionarily neutral) and directional selection may be more difficult to differentiate than is commonly appreciated. There is a functional way forward, as simple models from biophysical chemistry enable us to relate pathway function, protein concentration, and binding (and catalytic for enzymes) activities of coding sequences in mutation–selection frameworks. This is one potential alternative as a mechanistic modeling framework to more empirical approaches.

Molecular Evolutionary Ecology

With the decreasing cost and increased ease of generating genomic-scale data for non-model organisms, molecular evolutionary ecology has undergone somewhat of a new renaissance period over the past decade. Genomic-scale data, ranging from thousands of SNPs, to hundreds of molecular sequences to whole transcriptomes and genomes have resulted in markedly improved resolution to, for example, detect loci under selection, resolve phylogenies, and study speciation and hybridization. Here, each of these areas of study will be discussed, with a concluding section on future directions.

Detecting Loci Under Selection: Population and Landscape Genomics

One research area that has burgeoned in the genomics age is the search for loci underlying local adaptation (Hoban et al. 2016). Originally, common garden experiments and/ or field-based reciprocal transplant experiments were used to document whether populations are locally adapted. Most commonly, local adaptation was inferred if individuals from a population had higher fitness (correlates) in their home environment than an environment away from their natal habitat. More recently, an approach for determining the molecular underpinnings of local adaptation emerged in the analytical frameworks of population genomics (Luikart et al. 2003) and landscape genomics (Joost et al. 2007). The main premise for both lines of inquiry is that, by analyzing a large number of loci, some allele frequencies or genetic distances will be correlated with variation in abiotic (or biotic) variables (Luikart et al. 2003; Joost et al. 2007). Accordingly, sampling occurs in different parts of a species' geographic range that vary in the environmental characteristic of interest, such as rainfall or altitude. Two major analytical frameworks were developed to test for such patterns: outlier detection methods (Foll and Gaggiotti 2008) and genotype-environment association (GEA) analyses (Coop et al. 2010; Rellstab et al. 2015). Briefly, outlier detection methods generate a distribution of locus-specific genetic distance values (such as FST) and then conduct a statistical test for outlier loci; loci with the highest genetic distance values are indicative of positive selection, and loci with the lowest values indicate they are under purifying selection (Luikart et al. 2003). GEA methods test for correlations between allele frequencies and environmental variables (Coop et al. 2010; Rellstab et al. 2015; Hoban et al. 2016). Eventually, it will become possible to bring more mechanistic approaches that are being developed in molecular evolution into molecular ecology as well.

In examining currently applied methods in molecular ecology, numerous analytical methods were developed to conduct outlier analyses and GEAs, and several simulation studies have followed. Some general lessons can be taken from these studies. One major consideration is the background demography of the species and populations under study. For example, different analytical methods provide different power whether a population has recently expanded (e.g., in the case of an invasive species) or has contracted (e.g., in the case of a species of conservation interest) (deVillemereuil et al. 2014; Lotterhos and Whitlock 2014, 2015). A second insight is that there is always going to be a top X% (with X being the desired cutoff for what is being considered as significant) of loci; that is, with a large number of statistical analyses, a number of loci will always come out as significant. The analytical frameworks all have ways to computationally account for multiple testing and false discovery rates, but the rate of false positive loci still remains high under various methods (deVillemeureil et al. 2014; Lotterhos and Whitlock 2014, 2015). Thirdly, most phenotypic traits under selection have polygenic underpinnings and even the best single locus studies (e.g., a GWAS for human height; Yengo et al. 2018) only explain roughly 10% of the phenotypic variance in a trait. This "missing heritability" (Manolio et al. 2009) means that loci discovered in a population genomics framework likely only explain a small proportion of the adaptive genetic variation in a locally adapted trait. Two other important caveats are related to the fact that most of the landscape genomics studies conducted in non-model species involve analyzing anonymous loci (e.g., SNPs generated by RAD-seq; Lowry et al. 2016). That is, most SNPs determined to be under selection are often not found in a gene or regulatory region, but rather in proximity to one. As such, a "moving window" approach can be used to search for genes within the range of linkage disequilibrium of the candidate SNP. If the species under study has small linkage blocks, however, the true allele under selection may often be missed (Lowry et al. 2016). In general, caution should be used with random marker-based studies of species with small linkage groups because even a fairly large number of SNPs (tens of thousands) may only cover a small portion of the genome (Lowry et al. 2016). Despite these caveats, population and landscape genomics studies have yielded invaluable new information regarding population delineation, conservation and management units, and many candidate loci under selection that have enhanced our understanding of the mechanistic basis for local adaptation (Andrews et al. 2016; Hohenlohe et al. 2018).

From the identification of candidate loci under selection with current methods, establishing a functional role remains a challenge. Transcriptomic sequencing can be conducted without a reference genome, and differential expression of a candidate locus in different populations can be a way to verify putative function for transcription-based phenotypes. While in its early stages of application to non-model organisms, CRISPR can be used to modify any gene, and thereby test putative function. Such studies may be hard to conduct in vivo, but CRISPR could be conducted in vitro in cultured cell lines. Additionally, more attention could be paid to the influence of biotic factors on local adaptation. To date, most landscape genetics studies have focused on abiotic environmental factors, such as altitude or temperature. However, emerging infectious diseases, or other species, such as key predators or prey can greatly influence patterns of local adaptation. Take, for example, devil facial tumor disease (DFTD) a deadly, transmissible cancer of Tasmanian devils that has caused widespread population declines (McCallum et al. 2009). A landscape genomics study showed that DFTD resulted in a decrease in the strength of local adaptation to abiotic factors, such as precipitation, after the disease arrived (Fraik et al. 2019). To that end, population and landscape genomics studies can move more toward studying biotic interactions among species, referred to as a "landscape community genomics" approach (Hand et al. 2015).

The key idea behind landscape community genomics studies is to meld studies of the effects of abiotic landscape characteristics on the spatial arrangement of populations, with the influence of biotic community interactions to test how ecological dynamics affect genomic variation and gene flow. It has long been recognized that community-level interactions among species can drive evolutionary genetic processes, such as population genetic structure (i.e.,"community genetics"; Antonovics 2003; Collins 2003). For example, a study of steelhead trout showed that the genotypes of their trematode parasite resulted in a more accurate assignment of trout to their source population than the trout genotypes themselves (Criscione et al. 2006). Indeed, considering the influence of competition, predation or co-evolution in addition to the spatial arrangement of populations can provide new insights into the evolutionary processes that shape species’ distributions (Hand et al. 2015). Explicit models for species interactions in communities that interface with metagenomic and ultimately full genomic data are a futuristic area (Aldebert and Stouffer, 2018; Shoemaker et al. 2019).

Speciation and Hybridization

The availability of genomic-scale data has also greatly improved our ability to study the processes of hybridization and speciation. For a long time, evolutionary biologists were interested in identifying the genes that contribute to reproductive isolation and speciation, or so-called "speciation genes" (Orr et al. 2004). However, it was not until the past decade that scientists began to unravel the effect size of genes that contribute to reproductive isolation (Nosil and Schluter 2011). For example, across several species of Drosophila, approximately 18 genes underpinned intrinsic post-mating isolation (Coyne and Orr 2004).

Ecological speciation, or speciation without geographic isolation has also been a major focus of recent diversification studies (Schluter 2009; Nosil and Schluter 2011). An example is a study of hawthorn maggots that went through a phenological host shift to feed on apple. A selection experiment showed that the phenological host shift entailed genome-wide divergence patterns similar to that observed in natural populations (Egan et al. 2015). In general, understanding the speciation process in the face of gene flow (Feder et al. 2012) has garnered widespread interest, and genomic tools, such as the generation of large numbers of anonymous genome-wide markers, allow for empirical tests of model predictions. Further, genome-wide marker sets allow testing of which parts of the genome are in the process of generating inter-specific divergence via maintenance of reproductive isolation and, conversely, which parts are homogenized by gene flow.

Genomics and next-generation sequencing have also advanced studies of hybridization. For example, the collared and pied flycatchers naturally hybridize, but researchers discovered approximately 50 divergence islands—regions of the genome with about 50 × the genetic differentiation of the background and that complex repeat structures appear to drive divergence of the two species (Ellegren et al. 2012). Researchers can now test the proportion of the genome that is introgressed from each of the parental species in a hybrid zone (Gompert and Buerkle 2011; Parchman et al. 2013). A recent study showed that there was shared introgression across two different hybrid zones of spotted and collared towhees, suggesting consistency in areas of the genome affected by gene flow (Kingston et al. 2017). Future genomic studies of hybridization can investigate the joint divergence between nucleotide sequences and transcriptomes, leading to insights in understanding the relative influence between DNA divergence and gene expression levels in maintaining and/or destabilizing hybrid zones. Further, we may be able to better appreciate the genomic basis of reproductive isolation, speciation and hybridization as our understanding of the function of structural genome variation improves, such as the relationship between gene copy number and phenotype.

From Phylogenomics to Phylodynamics

Phylodynamics is an application of phylogenomics to study the evolution of whole parasite genomes, usually those of viruses (Holmes and Grenfell 2009). For example, phylodynamics analyses of HIV showed that the first introduction of HIV-1 into the new world was most likely in Haiti, with the subsequent US introduction from Haiti in 1969, 12 years earlier than previously thought (Gilbert et al. 2007). A more recent phylodynamic analysis of the major African Ebola outbreak from 2013–2016 showed that the epidemic arose from a single spillover infection in Guinea due to the high genetic similarity of virus genomes sampled early in the epidemic (Gire et al. 2014; Holmes et al. 2016). Despite this early genetic similarity among isolates, the ebola strain named “EBOV Makona” spread to Sierra Leone and Liberia, which then diversified into separate, largely independently evolving clusters. One possibility is to expand phylodynamic studies to parasites other than RNA viruses, although associated analyses may be challenging computationally. Phylogenomics studies can also be applied to understand the evolution of virulence by studying the evolutionary dynamics of cross-species transmission, the associated changes in virulence during host switches, and the genomic basis underlying these changes (Geoghegan and Holmes 2018).

The Changing Role of Theory in Population Genomics

Journal of Molecular Evolution has a long tradition of publishing population genetic research, going back at least to some of the foundational papers in the development of the neutral theory (Kimura and Ohta 1971) and the nearly neutral theory (Ohta 1972). Today in an era of genomics, the once theory-heavy field of population genetics has become increasingly data-driven, and the population genomics of 2020 and beyond can expect to see the growing use of genome-scale data sets. Richard Lewontin famously wrote, some 45 years ago, of the transition of population genetics, from a theory-laden to a data-swamped field (Lewontin 1974). In the case of that particular data-swamping, the theoreticians eventually caught up (e.g. Kingman 1982, Charlesworth et al. 1993, Gillespie 2000), but now it has happened again with genome-level data. The enormous information content in population-genomic data sets drives much of the current research on genetic mapping, on the study of natural selection, and on demographic inference—to mention just three long-standing and still big areas of research. Many researchers who work with quantitative population genetic models, or would like to do so, have found that the scale of the data and the challenges of applying theory on such scales have transformed them into statisticians. This is not a bad thing—as the potential for discovery, and the scope of those discoveries can be great with such large data; for all that, we are still doing model-based statistics. But it remains to be seen how theoreticians can respond to the opportunities and challenges of such large data. Will the future of mathematical and computational work in population genomics be dominated by the development of new inference technologies (i.e., statistics), as seems likely, given current trends? Will new advances in theory, and kinds of theory emerge complementarily to new inference with existing theory to help us gain a greater understanding of the processes driving the patterns we find in these vast data sets? It is clear that current theory needs expansion in multiple directions to deal accurately with selection in changing and large population sizes or with high mutation rates, to give examples, and that such theory would be welcome to those building a new molecular population genetic understanding of species.

Somatic Molecular Evolution

To date, our understanding of molecular evolution has meant germline molecular evolution. However, molecular changes also occur within multicellular organisms, so an individual’s cells evolve during their lifetime, generating somatic molecular evolution. All individuals are genetic mosaics to a different extent, but this has been largely unexplored with the exception of plants (Antolin and Strobeck 1985). Plants can even pass somatic mutations to their progeny, which sometimes confers adaptive advantages (Simberloff and Leppanen 2019).

In recent years, next-generation sequencing has been fundamental to disentangle somatic evolution at different levels, including genomes, methylomes, or transcriptomes (Posada 2015). Most studies of somatic evolution focus on cancer, for which numerous evolutionary studies, often still descriptive, exist about adaptation, population structure, mutational process and divergence (Williams et al. 2018; Martincorena et al. 2018a, b; Ling et al. 2015; Zhao et al. 2016; Alexandrov et al. 2013; Jiang et al. 2016; Sun et al. 2017; Alves et al. 2019). Not surprisingly, the neutral selection debate has also made its presence at the somatic level and it is still ongoing (Williams et al. 2016; Tarabichi et al. 2018).

More recently, a number of studies have tried to understand how cells evolve in healthy tissues, mostly in humans, including skin, blood, colon, liver, esophagus, or brain (Lopez-Garcia et al. 2010; Lodato et al. 2015; Ma et al. 2015; Martincorena et al. 2015; Blokzijl et al. 2016; Martincorena et al. 2018a, b; Su et al. 2018; Lee-Six et al. 2018; Yokoyama et al. 2019). Such studies show that normal cells also compete for space and resources, and that large clonal expansions can occur within a healthy tissue, often favored by strong positive selection. Understanding how somatic mutations accrue with time, or why mutational rates change among tissues might be essential to understand aging and related chronic diseases of aging, such as diabetes, heart disease, or neurological disorders. Nevertheless, interesting examples also exist outside the human body, generating an understanding of adult development from a single cell and the accumulation of mutations in the soma (Behjati et al. 2014; Schmid-Siegert et al. 2017; Alemany et al. 2018; Olsen et al. 2019).

Indeed, the growth of single-cell genomics (Macaulay and Voet 2014; Gawad et al. 2016; Tanay and Regev 2017; Baslan and Hicks 2017) and transcriptomics (Stegle et al. 2015) has been fundamental for this endeavor, and is not difficult to predict that it will continue to fuel the study of in somatic molecular evolution, in an intimate relationship with development, aging, and disease (Marioni and Arendt 2017).

Disentangling the molecular evolution of cells in humans and other organisms, addressing questions about cell selection and competition, adaptation, interaction with the microenvironment, diversification, mutational processes, genetic drift, phylogeography, population dynamics or phylogenetics, among many others aspects is futuristic. Upcoming studies will address not just empirical questions, as the studies referred above, but also methodological (Alves et al. 2017; Dou et al. 2018; Singer et al. 2018) and theoretical issues (Nowak et al. 2003; Spencer et al. 2006; Frank, 2010; Cannataro and Townsend, 2018) as well.

Somatic cell evolution can also unite with germline evolution in cases where somatic cells speciate into single-cell eukaryotic organisms. This can be viewed as happening unproductively in most somatic cell cancers. However, there are a few cases where transmissible cancers have emerged from multicellular organisms that persist over evolutionary time, including in canines (Baez-Ortega et al. 2019), twice in Tasmanian Devils (Patchett et al. 2019), and in bivalves (Metzger et al. 2016; Yonemitsu et al. 2019). This is a process of interest to both evolutionary and medical biologists, as well as to conservation biologists.

Evolution as a Tool: Directed Evolution

In 2018, the Nobel Prize in Chemistry was awarded to Frances Arnold, George Smith, and Gregory Winter for their work in applied protein evolution. Touching on both chemical evolution and on the evolution of genes and proteins as key journal topics, this Nobel Prize illustrates the importance of understanding evolutionary principles to not only understanding the function of evolved biomacromolecules but also toward engineering applications.

One driver of recent advances in directed evolution is the technology to effectively create/synthesize libraries with specific diversity constraints. Site saturation mutagenesis in particular has been an effective strategy for allowing both nature and designed protein scaffolds to perform a range of non-biological chemistries including: metathesis (Jeschek et al. 2016), enantioselective organic borylation (Kan et al. 2017), carbon-silicon bond formation (Kan et al. 2016), and C-H amination (Prier et al. 2017). While such studies do not mimic natural evolutionary processes, the products of these experiments inform us that some limitations of biological catalysts are a result of natural selection rather than inherent biophysical barriers.

A second driving factor in the evolution of protein catalysts is the development and widespread application of approaches that enable higher throughput screening or selection. Rare fitness peaks may be reached through ultra-high-throughput screening. In particular, microfluidic droplets that act as individual microreactors, and continuous evolution systems that circumvent the need for discreet selection rounds, are enabling the creation of enzymes with altered biological function. Microfluidic droplets enable encapsulation of individual cells with reagents that allow fluorescence-activated droplet sorting (FADS) assays for enzyme function utilizing fluorogenic substrates (Obexer et al. 2017), or approaches such as compartmentalized self-replication (CSR), wherein altered DNA polymerases expressed by a cell encapsulated with reagents for DNA replication evolve properties such as isothermal replication (Milligan et al. 2018), a proof-reading reverse transcriptase (Ellefson et al. 2016), and polymerization of the nucleotide analog, alpha-l-threofuranosyl nucleic acid (TNA) (Larsen et al. 2016). Continuous evolution systems that link phage replication within a reservoir of E. coli have been used to evolve DNA-binding specificity (Brödel et al. 2016), protease specificity (Packer et al. 2017), and aminoacyl-tRNA synthetases (Bryson et al. 2017) among other protein properties. Although these technologies were originally developed over a decade ago (Ghadessy et al. 2001; Esvelt et al. 2011), recent applications point toward the widespread adaptation of such technology to meet the desires of a growing synthetic biology community.

An example of the impact of progressive screening/selection approaches toward a specific goal is the engineering of SpCas9, a CRISPR-Cas9 nuclease from Streptococcus pyogenes. SpCas9 was originally engineered to expand allowed recognition sequences (PAM sequences) using a bacterial selection system with sequential positive and negative selections (Kleinstiver et al. 2015). Yet, for many applications, including widespread use in genome editing, further expansion of accessible sequences and reduction of off-target activity is critical. Thus, additional studies using simultaneous positive and negative selection in E. coli (Lee et al. 2018) or positive selection coupled with negative screening in yeast (Casini et al. 2018) enabled greater increases in specificity. Subsequently, this problem was approached using phage-assisted continuous evolution (PACE) wherein a virally encoded catalytically inactive Cas9 variant was tethered to a viral RNA polymerase necessary for phage amplification (Hu et al. 2018); thus guide RNA-dependent gene expression activation was used to link Cas9 DNA-binding and gene expression in a selection suitable for PACE. The CRISPR-Cas9 example also demonstrates how directed evolution approaches incorporating random mutagenesis processes can be complementary with various structure-based rational engineering approaches (Slaymaker et al. 2016; Chen et al. 2017).

Beyond, the in vitro evolution of molecules, the in vitro evolution of organisms enables experimental control and replication of key evolutionary variables not controllable in natural settings (for example, known selective pressures and population sizes). The most famous of such experiments is surely the Lenski experiment with E. coli, which has now been underway for > 25 years and over 60,000 generations (Lenski and Travisano 1994; Lenski 2017). Although this study was started well before the first bacterial genome was sequenced, inexpensive whole-genome sequencing in the last decade has allowed experimental demonstration of important evolutionary concepts including clonal interference (Maddamsetti et al. 2015), epistasis (Khan et al. 2011; Plucain et al. 2014), and convergence (Blount et al. 2018). This work has also been inspirational for others, and today there are numerous studies that incorporate lab adaptation as a strategy for understanding biological systems, or utilize laboratory adaptation as tool for industrial biotechnology (Remigi et al. 2019; Sandberg et al. 2019). Taking things one step further, inter-specific interactions can be examined in vitro, toward the design of in vitro ecosystems, with control of the organismal make-up, starting conditions, and ecosystem parameters (Lindemann et al. 2016; D’Souza et al. 2018). As theory has progressed in population genetics, molecular evolution, and molecular evolutionary ecology that gives expectations about allele frequencies, genome sequences, and species distributions over time, experimental tests in controlled settings are now possible and represent an exciting development moving forward.

Other Key Directions and Concluding Thoughts

The continued development of theory linking population-level processes through evolution to molecular-level processes is a critical element of molecular evolution. From theoretical developments, models that join intra-specific processes with inter-specific timescales and those that mechanistically capture sequence variation through the genotype–phenotype map are another area of interest. As a next step, new models lead to new methods and approaches for inference in computational biology. Standard assumptions of site independence, time homogeneity, and processes at equilibrium are made for mathematical and computational ease, but can be sufficiently violated in some biological situations to make inference that rests upon such assumptions questionable. Theory interplays with its implementation in computational methods and its application to new data that is generated with new experimental methods. All of this promises to be an exciting future for the field, including in the pages of Journal of Molecular Evolution.