Horizontal Gene Transfer of the Secretome Drives the Evolution of Bacterial Cooperation and Virulence

Summary Background Microbes engage in a remarkable array of cooperative behaviors, secreting shared proteins that are essential for foraging, shelter, microbial warfare, and virulence. These proteins are costly, rendering populations of cooperators vulnerable to exploitation by nonproducing cheaters arising by gene loss or migration. In such conditions, how can cooperation persist? Results Our model predicts that differential gene mobility drives intragenomic variation in investment in cooperative traits. More mobile loci generate stronger among-individual genetic correlations at these loci (higher relatedness) and thereby allow the maintenance of more cooperative traits via kin selection. By analyzing 21 Escherichia genomes, we confirm that genes coding for secreted proteins—the secretome—are very frequently lost and gained and are associated with mobile elements. We show that homologs of the secretome are overrepresented among human gut metagenomics samples, consistent with increased relatedness at secretome loci across multiple species. The biosynthetic cost of secreted proteins is shown to be under intense selective pressure, even more than for highly expressed proteins, consistent with a cost of cooperation driving social dilemmas. Finally, we demonstrate that mobile elements are in conflict with their chromosomal hosts over the chimeric ensemble's social strategy, with mobile elements enforcing cooperation on their otherwise selfish hosts via the cotransfer of secretome genes with “mafia strategy” addictive systems (toxin-antitoxin and restriction-modification). Conclusion Our analysis matches the predictions of our model suggesting that horizontal transfer promotes cooperation, as transmission increases local genetic relatedness at mobile loci and enforces cooperation on the resident genes. As a consequence, horizontal transfer promoted by agents such as plasmids, phages, or integrons drives microbial cooperation.


Detailed model
A general result of social evolutionary theory is that an altruistic gene, which confers a benefit b on another individual, at a cost c to an actor can spread in a population if Rb>c, where R is the genetic relatedness between two individuals [1][2][3][4]. The parameter R represents the genetic covariance between the two individuals, and is measured with respect to the locus in question. Genetic relatedness generally depends on the demography of a species and their ability to discriminate kin from non-kin, and will be enhanced if individuals do not disperse from where they are born (but note that while population viscosity can raise relatedness, it can also diminish the net benefits (b) of indiscriminate cooperation due to increased competition among relatives) and are able to target rewards to differentially towards kin [1, 2,5,6]. In such cases neighbours will be more likely to be related and R will be high.
We begin our analysis with a standard recursion equation for relatedness in a patch-structured population. We assume a basic life-cycle, where individuals reproduce, interact, migrate and finally population regulation occurs. We define R as the probability that two randomly picked individuals within a patch carry identical alleles at the focal locus. We assume that the population is subdivided into an infinite number of patches of size N, the probability that a given bacterial individual migrates to another patch in a given time interval is m and that selection is weak. Together these assumptions yield the recursion R t +1 Here (1-m) 2 is the probability that two random individuals will remain in a patch during a given time-interval and 1/N is the probability that the two non-migrant individuals stem from the same parent in the previous timestep (For more details, see [4]). We now extend this recursion to allow for horizontal gene transfer. We can expect that horizontal gene transfer (e.g via plasmid conjugation) will affect relatedness within a patch, either through the plasmid infecting other individuals (and thus increasing relatedness) or through plasmid loss (due to segregation) This equation is based on the unbiased horizontal transmission of cultural traits [7]. Here we allow for both the loss and gain of identity in state at the loci of interest, as a result of gene loss and withinpatch gene transmission. p K is the probability that two individuals carrying identical alleles at the focal locus remain identical in the next time-step and p G is the probability that two individuals carrying distinct alleles become identical in the next time-step. Again, we assume that selection is weak, a standard assumption in models of social evolution [8].
The probability that two individuals carrying distinct alleles become identical at the focal locus in the next time-step (p G ) will depend on gene mobility β at this locus, and the within-patch diversity at this locus, R(t). Here we assume that p G =βR(t), where β can be viewed for plasmids as the probability that conjugation will occur. The probability that two individuals carrying identical alleles remain identical in the next time-step (p G ) will depend on the potential for gene loss due to segregation, such that p K =1s, where s is the probability that neither of the pair segregate the gene. There are other ways in which horizontal transmission p G may be modelled [9] but these do not change the qualitative nature of our results (not shown).
At the limit where both within-patch gene mobility and segregation loss tend to zero (β 0 and s 0), the recursion equation (1) converges to an equilibrium at R* = , capturing relatedness (or Fst) as a function of deme size and migration, under purely vertical transmission [4].
At the limit when N ∞ and m 0 (i.e. a very large patch, with no migration), equation (1) converges to R*=1-s/β, which is the proportion of cells infected by a plasmid within a very large, well-mixed patch (a basic epidemiological result for the prevalence of infected individuals as a function of transmission and clearance [10]). Under these conditions, segregation reduces relatedness, while transmission increases relatedness. Additionally, if s=0 and m=0 (i.e. there is no migration between patches, and plasmids are never lost) then relatedness converges to 1, as all cells eventually become infected with the plasmid. Figure 2 shows how horizontal transfer affects relatedness within a patch, when R is at equilibrium (R*). Incorporating horizontal transfer into our calculation of relatedness shows that horizontal transfer of plasmids (plasmid infection) increases plasmid loci relatedness within patches, while segregation reduces relatedness. As increased local relatedness favours cooperation [1-3, 11], we conclude that horizontally-transferred genes will be more likely to code for cooperative traits than those that are less infectiously mobile. The effect of gene mobility on relatedness is illustrated in 3 figure 2, which shows the standard result that migration decreases relatedness within a patch (e.g. [6]): allowing for horizontal gene transfer greatly increases local relatedness. Therefore, we expect mobile loci to experience higher relatedness than more static loci, and therefore selection will favour infectious plasmids carrying cooperative traits. Our model suggests that relatedness does indeed increase due to horizontal gene transfer, and that this should be enough to offset the costs of investing in a social trait. While based on the biology of conjugative plasmids these results are expected to be applicable to mobile elements in general, including elements integrating in the chromosome. As such, our prediction is that social genes should be preferably coded in the most mobilisable regions of genomes.
Throughout this model we used the example of conjugative plasmids as they are well-known selftransmissible genetic elements that were also considered in previous works [12]. Our result is expected to be applicable to many other mobile elements such as integrative conjugative elements and mobilizable plasmids. In general we predict that the most mobile elements should be the ones carrying more cooperative traits. However, this may not be applicable to virulent phages, as they typically kill their hosts, which would violate our assumption of weak selection. It is an open question if the gains in relatedness are enough to offset the risk of cell death by temperate phages, knowing that prophages can be highly efficient weapons of niche invasion by lysogenic bacteria [13]. 4 Table S1. Number of proteins per genome. The pathogenicity character was taken from [14]. Genomes without sequenced plasmids are indicated as "0" under "plasmidic".  Figure S1. Association between CAI, a proxy of gene expression levels, and protein cost. The regression has a very low R 2 =0.088 (p<0.001), and the non-parametric Spearman correlation is also low (=-0.3). Red dots correspond to secreted and outer membrane proteins. All points were used in the regression, the colour is just intended to emphasise the biased distribution of relative cost among external proteins. We used the Codon Adaptation Index (CAI) as proxy of gene expression [15]. For this we used ribosomal proteins as the set of highly expressed genes from which optimal codon usage was taken. We then used it to compute CAI on all genes. CAI has been shown to correlate as well with transcriptome data as the latter between themselves and correlates better with proteome data than transcriptome data does [16].