Repeated origins, gene flow, and allelic interactions of herbicide resistance mutations in a widespread agricultural weed

Causal mutations and their frequency in agricultural fields are well-characterized for herbicide resistance. However, we still lack an understanding of their evolutionary history: the extent of parallelism in the origins of target-site resistance (TSR), how long these mutations persist, how quickly they spread, and allelic interactions that mediate their selective advantage. We addressed these questions with genomic data from 18 agricultural populations of common waterhemp (Amaranthus tuberculatus), which we show to have undergone a massive expansion over the past century with a contemporary effective population size estimate of 8 × 107. We found variation at seven characterized TSR loci, two of which had multiple amino acid substitutions, and three of which were common. These three common resistance variants show parallelism in their mutational origins, with gene flow having shaped their distribution across the landscape. Allele age estimates supported a strong role of adaptation from de novo mutations, with a median allele age of 30 suggesting that most resistance alleles arose soon after the onset of herbicide use. However, resistant lineages varied in both their age and evidence for selection over two different timescales, implying considerable heterogeneity in the forces that govern their persistence. The evolutionary history of TSR has also been shaped by both intra- and inter-locus allelic interactions. We report a signal of extended haplotype competition between two common TSR alleles, and extreme linkage with genome-wide alleles with known functions in resistance adaptation. Together, this work reveals a remarkable example of spatial parallel evolution in a metapopulation, with important implications for the management of herbicide resistance.


32
The evolution of resistance in agricultural pest populations occurs rapidly and repeatedly in re-33 sponse to herbicide and pesticide applications. Reports of herbicide resistance across agricultural 34 landscapes have been steadily growing, threatening crop productivity and greatly raising costs for 35 agricultural production (Peterson et al., 2018). These reports put a lower limit on the estimated 36 number of unique resistance cases-over 500 across the globe-based on just the occurrence of re-  (Brown, 1990), and are widely used in both corn and 93 soy production systems. They were rapidly adopted due to their application rates being an order of 94 magnitude lower than previous herbicides and thus increased affordability, along with low toxicity 95 and broad-spectrum weed control (Mazur and Falco, 1989), but quickly became notorious for their 96 ability to select for resistant weeds (Tranel and Wright, 2002). Use 105 Here we investigate the evolutionary histories of mutations that have been previously demon-106 strated to confer target-site resistance in A. tuberculatus, focusing on TSR mutations within ALS 107 and PPO. We infer the number of TSR mutational origins across populations and their distribu-108 tion across the landscape, examining the signals left behind by both mutation and recombination. 109 Specifically, we implement a method that infers the ancestral recombination graph (ARG) and that signals, some of which may be mediated by intra-and inter-locus allelic interactions. We also ex-117 amine signatures of the haplotype competition between common ALS resistance alleles, and the 118 extent that extreme selection from herbicides on TSR mutations has impacted diversity across the 119 genome. Our detailed population genomic analysis describing the repeatability of and heterogene-120 ity in target-site herbicide resistance evolution advances our understanding of rapid adaptation of 121 multicellular organisms to extreme selective pressure, while providing evolutionary informed pri-122 orities for agricultural weed management.

123
Types of target-site resistsance mutations 124 To test hypotheses about the origins of TSR in Amaranthus tuberculatus, we used whole-genome 125 sequence information from 19 agricultural fields in the Midwestern US and Southwestern Ontario, 126 Canada. It is important to note that these populations were obtained from fields where A. tuber-127 culatus was only poorly controlled. Because our sampling is thus potentially over representing 128 the frequency of resistance across the landscape, signatures of selection we observe across our 129 samples are likely stronger than averages across these geographic regions would be. other characterized mutations known to confer resistance in the genus Amaranthus. We examined 133 our sequence data for the presence of eight such substitutions in ALS, three in PPO, and one in 134 photosystem II protein D1 (psbA). Across 152 individuals, we found six out of eight known ALS 135 mutations, and one of the three known PPO mutations (Table 1). We did not find any mutation in 136 psbA.

137
The nine unique PPO and ALS target-site resistant mutations occur at seven distinct codons,  (Table 1). Notably, the most common resistance mutation (referring to 142 Table 1. Number and frequency of resistant individuals and mutations at loci known to be causal for resistance to PPO and ALS herbicides, both totals, and within each agricultural region. Relative frequencies given in parentheses.

ΔGly210
Trp-574-Leu Walpole) harbour multiple causal ALS mutations (Table 1). Thus, at just the level of these distinct 152 mutational types, we observe genetic convergence in adaptation to ALS-inhibiting herbicides at 153 global, regional, and population scales.

154
Regional selective sweep signals 155 To learn how and how often the individual mutations might have arisen, we first visualized regional 156 selective sweep patterns at PPO and ALS genes-two genes that happen to be located 250 kb apart 157 in the genome-with respect to the common Trp-574-Leu, Ser-653-Asn, ΔGly210 alleles. In particu-158 lar, we assayed the extent to which selection from herbicides at these genes has led to reductions in 159 diversity, and increases in homozygosity and linkage across the haplotype, as would be expected if 160 TSR alleles have increased in frequency rapidly enough that recombination has yet to unlink these 161 alleles from the background on which they arose. Corresponding selective sweep signals appear 162 to be highly heterogeneous across geographic regions and across resistance mutations (Fig 1). The 163 most pronounced selective sweep signal at the regional level is for the ALS Ser-653-Asn mutation,    Inferring the genealogical history of target-site resistance mutations 191 We first took a gene tree approach to reconstruct the evolutionary history of TSR mutations, based  ., 2016)). 194 We found that patterns of similarity among phased haplotypes at ALS and PPO (including 5 kb   204 We reconstructed the ARG for 20,000 SNPs encompassing both ALS and PPO genes (a 1 and

254
In addition, a test for selection over the most recent 0.01% of the tree showed that for all three 255 out of the four selected lineages that predated this time cutoff, there is even stronger evidence 256 for selection on more recent timescales, after FDR correction ( Fig 2C, Figure 2 -source data 1).

257
The most obvious of these is the ALS Ser-653-Asn variant #5, which, while having the weakest 258 evidence of consistent selection over its history, shows the strongest evidence of selection over Trp-574-Leu #1, #3), the 95% confidence interval of allele ages show these estimates are highly con-285 sistent across MCMC sampling of the ARG (Fig 3B) (Hejase et al., 2020). In comparison to gene trees that illustrate the average coalescent history of 384 these genomic regions (Fig 2 -figure supplement 1), we show that ARGs are extremely valuable for 385 further resolving independent origins of adaptive mutations.

386
From a mutation-limited view of adaptation, the extent of parallelism in target-site herbicide re-387 sistance that we observe here is particularly extreme. However, given the estimate of North Amer- terns we observe may be influenced by the sensitivity of ARG inference to both phasing quality 420 and fine-scale recombination rate variation. We performed a two-step phasing method, perform-421 ing population-level phasing with SHAPEIT4, a powerful up-to-date method (Delaneau et al., 2019), 422 after performing read-backed phasing with WhatsHap (Martin et al., 2016). Nonetheless, phase 423 switching remains a challenge for haplotype inference in naturally occurring populations in lieu 424 of long-read population resequencing. Phase-switching between haplotypes is likely to be inter-425 preted as a recombination event during ARG inference, however, by explicitly modelling how these 426 "recombinational" units relate to one another, ARG inference should still be better able to resolve 427 independent origins of adaptive haplotypes than traditional reconstruction methods. To adjust for 428 the potential phase-switching that may artificially inflate recombination rates, we ran ARGweaver such costs could be leveraged to prevent the persistence of resistance mutations (Vila-Aiub, 2019). 438 We were able to evaluate evidence for TSR alleles that predated the onset of herbicide usage, and 439 thus learn about how long these alleles persist, by rescaling allelic age estimates by the geometric 440 mean effective population size estimate over the last 50 years (Fig 3). We found evidence for one Relate (Speidel et al., 2019). Previously, we have also implemented a i (Gutenkunst et al., 2009), 456 which uses the site frequency spectrum, to infer demographic history in these samples and simi- suggesting that the current observed haplotypic competition is in fact a form of Hill-Robertson in-534 terference (where the rate of adaptation is slowed due to linkage) which can eventually be resolved 535 through recombination (Cooper, 2007;Hill and Robertson, 1966;Otto, 2021). 536 While we find that intra-chromosomal interactions such as competition between alleles have 537 influenced the selective trajectory of individual TSR alleles, we were also interested in the extent 538 to which interactions with physically unlinked loci may have facilitated herbicide resistance evolu-539 tion. We find evidence that selection on Essex haplotypes containing ALS TSR mutations has likely 540 been affected by such second-site modifiers (Fig 4C-E). We find particularly extreme signed LD 541 between TSR mutations and alleles on different chromosomes. LD between resistance alleles on 542 different chromosomes has been interpreted as epistasis (Gupta et al., 2021)), but LD between alle- best hope is to reduce the fitness advantage of resistant types by decreasing reliance on herbi-578 cides, rotating herbicide chemical compounds and mode-of-actions, and using herbicide mixtures.

579
However, one must also be wary of neighbouring weed populations. Gene flow of resistance mu-580 tations across the range is widespread, facilitating a rapid response to selection from herbicides, 581 potentially even in naive populations. In particular, we find strong regional patterns in the distri-582 bution of resistant lineages, suggesting coordinated herbicide management regimes across farms, 583 land-use planning, and hygienic machine-sharing will be important tools for efficient control of herbicide-resistant weeds.

586
Amaranthus tuberculatus sequence data 587 Sequencing and resequencing data were from a published study (Kreiner et al., 2019). Whole- = 500, 000), and used a monotonic spline to extrap-622 olate genetic distance to each SNP in our VCF. We provided SHAPEIT4 an effective population size 623 estimate of 500,000, inferred from previous demographic modelling in (Kreiner et al., 2019). 624 Tree inference 625 Gene trees were inferred based on haplotypes within focal target-site genes (ALS and PPO), and 5 626 kb on either side around them. This resulted in 296 SNPs and 622 SNPs for analysis for the ALS and 627 PPO trees respectively. Using the phased data around these genes, we first converted each phased 628 haplotype to FASTA format, performed realignment with clustal omega (Madeira et al., 2019), and 629 then ran clustal-w2 (Thompson et al., 1994) to infer the maximum likelihood tree, once for each 630 gene, with 1,000 bootstraps. We then plotted mutational status for each focal TSR mutation (ALS Trp-574-Leu, ALS Ser-653-Asn, and PPO ΔGly210) for each tip of both gene trees using ggtree (Yu,632 2020). 633 We ran ARGweaver (Rasmussen et (Gutenkunst et al., 2009; Kreiner et al., 2019). 639 To account for any bias introduced by phasing error in the form of inflated recombination rates, 640 we ran multiple iterations of ARGweaver at varying constant recombination rates, from = 7 × 10 −7 641 to 7 × 10 −9 , drawing our inferences from the parameter values that maximized the likelihood of  implementation of this method for trees outputted from Argweaver (Rasmussen et al., 2014). 677 Briefly, the statistic works as follows. Let be the number of carriers of our focal mutation 678 in the current day, N be the total present-day sample size, and be the number of susceptible 679 lineages present when the mutation increases in count from 1 to 2. We sum each individual prob-680 ability that a mutation spreads to at least a given frequency, from to − + 2. (1) The null hypothesis, that allele frequency change occurred under drift, is rejected when this 682 one-side p-value is sufficiently small (i.e. < 0.05), implying selection has governed the spread of 683 this mutation since it first arose. 684 We also modified this statistic to test for selection on more recent timescales, and thus the 685 scenario of adaptation from standing genetic variation. Here, we need to define , the number of 686 resistant lineages at some time ( ) before the present day, in addition to ( ).  The phased data used as input for ARGweaver was also used to extract selective sweep summary 706 statistics in selscan (Szpiech and Hernandez, 2014). In selscan, we estimated both XPEHH (Sabeti  We used plink (v1.90b3.46) (Purcell et al., 2007) to calculate both signed LD ( ) between each 717 focal TSR mutation and missense mutations on the same chromosome, and with all other bi-allelic 718 missense SNPs across the genome. We used nonsynonymous SNPs as we expected them to be 719 less influenced by population structure and admixture (Good, 2020) compared to synonymous 720 mous SNPs in Figure 4 -figure supplement 4. We performed these calculations with respect to a 722 given TSR mutation by using the -ld-snp options to specify a focal mutation. To visualize patterns 723 of signed LD between TSR mutations and other missense SNPs, we split the genome into non-724 overlapping 10 kb windows and calculated the average LD among all SNPs in each window. All LD 725 calculations were polarized by rarity (e.g. minor alleles segregating on the same haplotypes were 726 regarded as being in positive LD). In Essex, despite being considerably common, both ALS 574 and 727 ALS 653 had a frequency less than 50%, so LD values between all missense alleles, and both these 728 focal TSR mutations are directly comparable. As another visualization of the haplotype structure 729 in this region, we conducted two genotype-based PCAs using the R package SNPrelate (Zheng and 730 Wiehe, 2019), one for genotypes spanning 2 Mb around ALS, and one on genome-wide genotypes.

731
To test whether the negative LD we observed between ALS Trp-574-Leu and ALS Ser-653-Asn 732 was particularly extreme in Essex, we compared this value to pairs of either nonsynonymous or  At the most likely number of (timesteps = 30), the influence of increasing the recombination rate parameter constant value from r=e-7 to r=e-9 on the tree sequence inference from the most likely ARG.