McInerney et al. reply — Shapiro has raised a number of interesting and important points about pangenome evolution, focusing mainly on the rate of gene acquisition1. We agree that there are several aspects to the population genetics of pangenomes that need to be considered. It is true that prokaryotes that engage in horizontal gene transfer are difficult to treat as a ‘true’ species and in particular, a gene might find itself in organisms with different effective population sizes at different times in its history. However, there is a lot of evidence that prokaryote genomes and their constituent genes obey expectations from population genetic theory.

At least five factors correlate — organismal lifestyle (intracellular parasites at one extreme, global free-living organisms at the other) correlates with pangenome size (minimal accessory genome at one extreme and minimal core genome at the other), which correlates with rate of horizontal gene transfer (low to high), which correlates with size of effective population size (N e ) (from small to large), which correlates with efficiency of natural selection. It is arguable that N e drives all these correlations. In other words, large N e increases selection efficiency. This means that as N e increases, smaller and smaller fitness effects are ‘seen’ by selection. In free-living prokaryotes, it is likely that fitness differences of one-tenmillionth or smaller are seen by selection2.

Selection efficiency can be seen in synonymous codon changes in highly expressed genes in species with large N e , which tend to be under selection for optimal codon usage3. In contrast, intracellular parasites are less likely to exhibit signatures of synonymous codon-usage optimization than free-living prokaryotes4 and a greater portion of their genome is influenced by drift. Effects that are clearly seen in species with large N e are completely missing in species with small N e 5.

The contrast between prokaryotic genomes and mammalian genomes is instructive. Mammalian genomes are replete with junk (nearly neutral) DNA6 and this is a consequence of N e . In an apparent paradox, mammals with small N e tend to accumulate large genomes because selection against junk DNA is inefficient. However, in prokaryotes it is species with large N e that tend to have larger genomes and pangenomes. Unless we propose that organisms with larger N e tend to pick up genes with such a small effect on fitness that selection is ineffective, then a neutral model does not seem to fit the data.

Prokaryotic genomes typically consist of 3–3.5% pseudogenes7. These are likely to be effectively neutral or slightly deleterious. However, 96.5–97% of prokaryotic genes are fully intact, and usually functional, as evidenced by gene expression and ratios of non-synonymous to synonymous substitutions that are almost always far less than one8. Comparative genomics show us that pseudogenes do not remain in a genome for long after they have arisen — selection is usually efficient enough to remove them9. In contrast, a typical Escherichia coli genome, rather than the usual 3–3.5% (~150) pseudogenes, consists of more than 50% (~2,250) accessory genes.

In truth, we do not have good estimates for the fitness effects of many of the 85,000 accessory genes found so far across E. coli, and this is clearly a gap in our knowledge. However, for a nearly neutral explanation to work, we might expect to see far more pseudogenes or else fewer accessory genes.

We note several known unknowns concerning prokaryotic genomes. Firstly, for a wide range of organisms and a wide range of genes, we do not know what the fitness effects of alternative synonymous codons are. We also do not have precise measurements of the fitness effects of accessory genes. We now have some data on the dynamics of accessory gene accumulation, with some evidence showing that an immediate short-term reduction in fitness is rapidly overcome by compensatory mutations10. However, this is almost certainly observed only because of long-term selective benefits to acquisition and maintenance of the plasmid.

The problem for any explanation of how pangenomes arise has always been that efficient selection should result in selective sweeps and a genome that is relatively homogeneous for a species. Here, we have the paradox that large N e and efficient selection both correlate with large pangenome size. A slightly more complex model that allows for migration to new niches solves this problem11.