Adaptive evolution and demographic history contribute to the divergent population genetic structure of Potato virus Y between China and Japan

Abstract Potato virus Y (PVY) is an important plant pathogen causing considerable economic loss to potato production. Knowledge of the population genetic structure and evolutionary biology of the pathogen, particularly at a transnational scale, is limited but vital in developing sustainable management schemes. In this study, the population genetic structure and molecular evolution of PVY were studied using 127 first protein (P1) and 137 coat protein (CP) sequences generated from isolates collected from potato in China and Japan. High genetic differentiation was found between the populations from the two countries, with higher nucleotide diversity in Japan than China in both genes and a KST value of .216 in the concatenated sequences of the two genes. Sequences from the two countries clustered together according to their geographic origin. Further analyses showed that spatial genetic structure in the PVY populations was likely caused by demographic dynamics of the pathogen and natural selection generated by habitat heterogeneity. Purifying selection was detected at the majority of polymorphic sites although some clade‐specific codons were under positive selection. In past decades, PVY has undergone a population expansion in China, whereas in Japan, the population size of the pathogen has remained relatively constant.


| INTRODUCTION
Genetic drift, gene flow, and natural selection are three main evolutionary forces shaping the spatial population genetic structure of species (Zhan & McDonald, 2004). Under constrained gene flow, stochastic changes in allele frequencies among geographic populations can result in random fixation of neutral alleles, leading to nonadaptive differentiation (Wright, 1938). On the other hand, divergent selection for various ecological or physiological characters among genetically isolated populations may lead to adaptive population subdivision (Koskella & Vos, 2015;Yang et al., 2016). In plant pathology, understanding how populations are spatially structured is important to project the evolutionary trajectories of pathogens and formulate approaches for sustainable plant disease management (Zhan, Thrall, & Burdon, 2014). For example, many soil-borne plant pathogens are highly spatially structured (Gilbert, 2002) and regional R gene deployment may be an appropriate method for an effective and durable control of plant disease caused by the pathogens. On the other hand, recombination can facilitate the reshufflings of genomes and R gene pyramids may be less efficient in managing sexual pathogens usually characterized by higher genetic variation distributed at a fine scale (McDonald & Linde, 2002).
Plant pathogens can vary greatly in spatial population genetic structure (Barrett, Thrall, Burdon, & Linde, 2008;Zhan, Thrall, Papaix, Xie, & Burdon, 2015), depending on the biotic and abiotic factors they associated with such as host genetics, pathogen biology, physical environments, and the ways of human intervention during and postagricultural production. These factors can affect individually and interactively on the extent of genetic drift, gene flow, and selection, therefore influencing the generation and maintenance of spatial population structure (Bergholz, Noar, & Buckley, 2011;Zhan & McDonald, 2013). Natural selection is expected to play a central role in the spatial population genetic structure of plant pathogens in agricultural ecosystems (Thrall et al., 2011), primarily due to variation in host genetics, fungicide applications, climatic conditions, and agricultural practices among regions. Directional selection for virulence, fungicide resistance, and other ecological characters related to particular biogeographic environments can drive the rapid accumulation of adaptive genetic differentiation in plant pathogen populations (Achtman & Wagner, 2008). At the same time though, variation in these same factors among regions can also strongly influence the demographic dynamics of plant pathogens, generating nonadaptive genetic differentiation in the pathogen populations. However, despite recognition of its importance, an in-depth understanding of how patterns of spatial population genetic structure are generated and maintained and the main evolutionary mechanisms responsible for these patterns are still limited for many plant pathogens, but critical for sustainable plant disease management (He, Zhan, Cheng, & Xie, 2016;Zhan et al., 2015).
Potato virus Y (PVY) is a member of the genus Potyvirus in the family Potyviridae. Its genome has a single-stranded positive-sense RNA of ~9.7 kb, encoding a polyprotein that is cleaved into 10 mature functional proteins (King, Lefkowitz, Adams, & Carstens, 2011).
Additionally, a short polypeptide (PIPO) is expressed within the P3 cistron by frame shifting (Chung, Miller, Atkins, & Firth, 2008). Among the 11 functional proteins encoded, the first protein (P1) and the coat protein (CP) are thought to play an important role in the adaptation of potyviruses to host species (Valli, López-Moya, & García, 2007). Furthermore, the CP gene has been frequently used in strain identification, species classification, and phylogenetic analysis of potyviruses (Cuevas, Delaunay, Rupar, Jacquot, & Elena, 2012).
PVY is one of the most destructive pathogens affecting potato (Solanum tuberosum L.), the third largest food crop in the world (Birch et al., 2012) and widely grown in many Asian countries including China and Japan. It can cause 40%-70% yield reduction in potato production (Nolte, Whitworth, Thornton, & McIntosh, 2004) and significantly reduce the quality of seed tubers. As the largest producer in the world, China accounts for 26.3% and 22.2% of the global potato acreages and yields, respectively , and potato production in the country is expected to substantially increase in coming decades due to government support and dietary shifts (Kearney, 2010). PVY is one of the main factors constraining further development of Chinese potato industry .
Knowledge of the population genetics and evolutionary biology of PVY, particularly at a transnational scale, is relatively limited compared to other important plant pathogens such as Magnaporthe (Tredway, Stevenson, & Burpee, 2005), Pyrenophthora (Gurung, Short, & Adhikari, 2013), Verticillium (Short et al., 2015), and Phytophthora (Tian et al., 2016). Many factors such as technology and resource availability may partially contribute to this shortage.
In addition, importation of living pathogens even for scientific reasons is strictly forbidden in many countries including China due to quarantine regulations. As a consequence, analysis of biotrophic pathogens with no or few morphological characters used to be very challenging for many researchers. With the advance of PCRbased sequencing technology and sharing of sequence data in many public domains, empirical analysis of the population genetic structure and evolutionary biology of biotrophic plant pathogens such as PVY at an international scale has become possible. This approach has been used by several laboratories to infer the evolution of PVY (Cuevas et al., 2012;Ogawa, Tomitaka, Nakagawa, & Ohshima, 2008), usually involving bona fide geographical populations. Ogawa and colleagues (Ogawa, Nakagawa, Hataya, & Ohshima, 2012;Ogawa et al., 2008)  The objectives of this study were to (i) compare the population genetic structure of PVY in China and Japan and (ii) determine the main evolutionary and demographic mechanisms responsible for the observed population genetic structure. to high mutation rate in RNA viruses, at least four cDNA clones derived from two separate PCR reactions were sequenced for each PVY isolate to ensure consensus. Only the sequence identical in at least three clones was used for further analyses to eliminate potential heterogeneity introduced by Taq polymerase. All sequences generated in this study were deposited in GenBank databases. In addition, 24 sequences of P1 gene and 26 sequences of CP gene from other parts of China (including Guizhou, Shandong, Heilongjiang, Liaoning, and Shaanxi provinces) and 34 sequences each of CP and P1 genes from Japan were retrieved from GenBank (Table S1, Figure 1). Consequently, the combined sequences from China were generated from PVY samples collected between 2005 and 2012 and the Japanese sequences were generated from samples collected between 1995 and 2012. As host-driven adaptation could affect the diversification of viral isolates, only PVY sequences derived from potato were retrieved and included in the analysis of population genetic spatial structure.

| Genetic diversity and population genetic differentiation
Nucleotide sequences of the P1 and CP genes were aligned using the Muscle algorithm (Edgar, 2004) implemented in MEGA5 (Tamura et al., 2011). A nucleotide identity matrix was generated using BioEdit software (Hall, 1999) after all gaps were removed. Haplotype diversity (H d ) and nucleotide diversity (π) were estimated using DnaSP 5.0 (Librado & Rozas, 2009). Pairwise F ST , a measure of genetic differentiation among populations, was computed in Arlequin 3.5 (Excoffier & Lischer, 2010). Genetic differentiation among populations was also evaluated by K ST and S nn using DnaSP 5.10 (Hudson, 2000;Hudson, Boos, & Kaplan, 1992;Librado & Rozas, 2009). The hypothesis of deviation from null population differentiation was tested by 1000 permutations of the original data.

| Recombination and phylogeography analyses
Putative recombination joints (RJ) and parental sequences were identified using seven methods (RDP, GENECONV, BOOTSCAN, MaxChi, CHIMAERA, SiSCAN, and 3SEQ) implemented in the RDP4 suites (Martin et al., 2010). The probability of a putative recombination event was corrected by a Bonferroni procedure with a cutoff of p < .01. To avoid false identification, only events supported by at least four of the seven methods were considered to be recombinants.
Recombinants were removed from the subsequent reconstruction of phylogenetic trees.
Phylogenetic trees were reconstructed by the Bayesian inference (BI) implemented in MrBayes 3.2.5 (Ronquist et al., 2012) and maximum likelihood (ML) implemented in MEGA5 (Tamura et al., 2011) using the GTR + G + I nucleotide substitution model determined by MrModeltest (Nylander, 2008). BI was run in 2,000,000 generations of Markov chains that were sampled every 100 generations to establish convergence of all parameters. The effective sample size (ESS) of parameters was checked by Tracer 1.6 to ensure values above 200 as advised by the programmers with the first 25% of sampled trees burn-in. Topology robustness of ML trees was assessed by 1,000 bootstraps. ML-BPs and BI-PPs were plotted on Bayesian 50% majority-rule consensus trees using FigTree 1.4.2 and Illustrator CS5 (Adobe).
The effect of geographic origin on PVY populations was evaluated by phylogeny-trait association analysis, and BaTS 2.0 (Parker, Rambaut, & Pybus, 2008) was used to compute an association index (AI), parsimony score (PS), and maximum monophyletic clade (MC).
Low AI and PS scores and high MC scores suggest a strong PVY-geography association. The topology of reconstructed trees was tested using BaTS by comparing it with the randomized trees generated from 10,000 reshufflings of tip characters. Topology robustness was determined in BaTS by comparing it with the null distribution of trees obtained from 10,000 bootstraps of tip characters.

| Natural selection
HyPhy 2.10b (Kosakovsky Pond, Frost, & Muse, 2005) and PAML 4.7 (Yang, 2007) were used to identify nucleotide sites in P1 and CP cistrons that were likely to be involved in PVY adaptation. Three codonbased approaches, that is, IFEL (internal branches fixed-effects likelihood), REL (random-effects likelihood), and MEME (mixed effects model of evolution) (Kosakovsky Pond et al., 2006Pond et al., , 2011Murrell et al., 2012), were included in the HyPhy package. Only sites simultaneously identified by IFEL and MEME with p < .05 and >.95 posterior probability identified by REL were considered to be under selection. In addition, the ratio of nonsynonymous (dN) to synonymous (dS) substitution (ω = dN/dS) was calculated for each gene using CODEML algorithm implemented in PAML 4.7 (Yang, 2007). Three different nested models (M3 vs. M0, M2a vs. M1a, and M8 vs. M7) were compared and likelihood-ratio tests (LRTs) were applied to select the one that best fitted the data. When the LRT was significant (p-value < .01), the Bayes empirical Bayes (BEB) method (Yang, Wong, & Nielsen, 2005) was used to identify amino acid residues that are likely to have evolved under positive selection based on a posterior probability threshold of .95.

| Population historic dynamics
Two different approaches were used to explore the demographic history of PVY populations in the two countries. First, Tajima's D and Fu's F S statistics implemented in Arlequin 3.5 (Excoffier & Lischer, 2010) were used to determine the neutrality of the P1 and CP genes by 1,000 permutations of the original sequences. Tajima's D test identifies evolutionary events such as population expansion, bottlenecks, and selection by comparing the estimated number of segregating sites with the mean pairwise difference among sequences (Tajima, 1989). Fu's F S is sensitive to population demographic expansion and usually displays a negative value (Fu, 1997) in expanding populations. Following these calculations, Bayesian skyline plots (BSP) was generated to explore demographical history using BEAST 1.8.2 (Drummond, Suchard, Xie, & Rambaut, 2012). Sampling times of the sequences were used to calibrate the molecular clock. Date-randomization tests (DRTs) were performed in R 3.3.1 using the Tip Dating Beast package (Rieux & Khatchikian, 2016) to determine the temporal signal in data sets. A data set was considered to have an adequate spread in sampling time if its average rate did not fall within the 95% confidence intervals (CIs) generated from 20 replicates of randomized dates (Ramsden, Holmes, & Charleston, 2009;Duchêne, Duchêne, Holmes, & Ho, 2015). A Bayes factors test indicated that the relaxed uncorrelated exponential model was a better fit to the sequence data than the relaxed uncorrelated lognormal model and was chosen to estimate the molecular clock of P1 and CP genes.
The MCMC was run for 5 × 10 8 generations to ensure convergence of all parameters. Convergence and ESS (>200) of the parameters were checked using Tracer 1.6.

| Sequence variation in P1 and CP genes
Seventy-three P1 and 78 CP genes were sequenced in this study and were deposited in GenBank (Table S1) (Table S1) (Table 1). The most common haplotype was detected four times in P1 sequences and 10 times in CP sequences. No identical haplotypes were detected in the two countries in both genes. When the sequences were considered according to individual geographic location, higher nucleotide but lower haplotype diversity was found in Japan than in China. In the CP genes, 121 haplotypes were identified in the 137 complete sequences, with an overall haplotype diversity of .984 and nucleotide diversity of .053 when sequences from the two countries were pooled. Similar to the P1 gene, higher nucleotide diversity but lower haplotype diversity was found in the CP gene from Japan than from China when they were considered separately.

| Genetic differentiation between sequences from China and Japan
K ST , S nn , and F ST were .141, .972, and .294 in the P1 gene and .289, .949, and .520 in the CP gene (Table 2), respectively. All of these indexes were significantly higher than the theoretical expectation of no population differentiation. K ST and F ST values were higher in the CP gene than in the P1 gene.

| Recombination analyses
Forty-two P1 sequences, all originating from China, were identified with a high level of confidence as recombinants by all seven methods implemented in the RDP package (p-values ≤1.27 × 10 −2 , Table   S2). All of these sequences were derived from three recombination events. Similarly, three recombination breakpoints were detected in eight CP sequences from China and four CP sequences from Japan by at least six of the seven approaches (p-values ≤1.69 × 10 −2 , Table S2). In the N clade of P1 tree, all 41 Chinese isolates were clustered into subclade 1 and all but three (NTNHIR3, NTNKGAM2, and NTNTK1) of the 27 Japanese sequences were grouped into subclade 2. Similarly, all 10 Chinese isolates were clustered into subclade 3 and all seven Japanese isolates were grouped into subclade 4. In the N clade of the CP tree, all nine Chinese sequences were clustered into subclade 1, whereas all 23 Japanese sequences were grouped into subclade 2. In contrast, O clade was composed of a mixture of Chinese and Japanese sequences. A similar pattern was found when only the P1 and CP sequences from GenBank (i.e., excluding the sequences generated in this study) were used to construct a phylogenetic tree (data not shown). Distinct differences in the population genetic structure of PVY between China and Japan were also found by phylogeny-geography association analysis. Significant MC (p < .01), AI (p < .001), and

| Detecting natural selection
Purifying selection was detected in the majority of polymorphic sites by PAML packages (Figure 3)

| Demographic history
Significantly negative Fu's F S was detected in both P1 and CP sequences from China (Table 5); Tajima's D was positive for the P1 sequences and negative for the CP sequences from China, but none of them were significant. All Tajima's D and Fu's F S in the sequences from Japan were positive but not significant (Table 5).
All data included in the demographic analysis passed the DRTs that showed no overlaps between the original estimate of evolutionary rate and 95% CIs generated from 20 replicates of date randomization ( Figure S2). Coalescence-based BSP revealed an explicit demographic history for the PVY populations of China and Japan

| DISCUSSION
Consistent with previous results (Ogawa et al., 2008(Ogawa et al., , 2012Tian et al., 2011), high genetic diversities (Table 1) were found in both P1 and CP genes in the current study. Each of these two genes encodes a protein with ecological functions that are important to the survival and adaptation of PVY in local biotic and abiotic environments (Quenouille, Vassilakos, & Moury, 2013). In addition to the high mutation rate in RNA viruses (Domingo, Sheldon, & Perales, 2012), high genetic variation found in both current and previous studies is possibly attributed to the high recombination rate in PVY populations (Gibbs & Ohshima, 2010). In recent decades, recombinant PVYs were found to be prevalent in the global population of the pathogen sampled from potato production areas (Quenouille et al., 2013). In this study, 52.69% and 11.76% of concatenated sequences in the Chinese and Japanese populations show a mixed genomic structure of PVY N and PVY O (Table   S2). These results re-enforce widespread concern related to the propensity for rapid adaptation of PVY to changing biotic and abiotic environments. Such adaptation may cause problems for the deployment of major gene-mediated host resistance in potatoes, tomato, and other crops (Karasev & Gray, 2013).
Our study also reveals a significant difference in the molecular population genetic structure between the PVY sequences originating from potato hosts in China and Japan-a pattern that is supported by comparative analyses of genetic variation (Table 1), population differentiation (Table 2), and phylogeny-geography association (Table 3, Figure 2) in the two genes (each represented by ~130 sequences). It has been documented that measurements of the diversity of genetic variation are highly sensitive to sample sizes (Bashalkhanov, Pandey, & Rajora, 2009) and populations with larger sample sizes tend to have higher observed genetic diversity. Despite smaller sample sizes included in the Japanese than Chinese populations, both P1 and CP sequences originating from Japan show higher nucleotide diversity than those from China, suggesting that differences in nucleotide diversity between the two viral populations are unlikely to be the result of sampling error.
However, the higher haplotype diversity in Chinese sequences may be associated with the somewhat larger sample size as compared to Japanese sequences (93 vs. 34 isolates). Indeed, when we used a bootstrapping approach described previously (Zhan, Pettway, & McDonald, 2003) to standardize sample sizes with the smaller population (Japan in this case) and recalculated haplotype diversity, the difference in genetic variation between the two populations disappeared (Table S3). The bootstrap was conducted using the Resampling Stats add-in package for Excel (Blank, Seiter, & Bruce, 2001) with 100 replicates. For each bootstrap replication, haplotype diversity was recorded from a random sample of 34 sequences (the actual size of Japanese population). The mean and variance of the diversity were calculated and used for a t-test.
F I G U R E 2 Bayesian phylogenetic trees of Potato virus Y (PVY) isolates based on P1 (a) and CP (b) coding regions. For three key nodes in a tree, the Bayesian posterior probabilities and maximum likelihood bootstrap percentage are indicated above the branches (Bayesian posterior/bootstrap). PVY isolates from this study are indicated by black dots. The distance unit is substitutions/site Three indices were used to test population differentiation between Chinese and Japanese PVY sequences. K ST and F ST measure the relative proportion of total genetic diversity attributable to among-population differences. They range from .00 to 1.00. A value of one for K ST or F ST indicates that populations investigated are completely isolated, while a value of zero indicates the populations studied are identical (Hudson et al., 1992). K ST or F ST values between .15 and .25 indicate high population differentiation, and values greater than .25 indicate very high genetic differentiation among populations (Balloux & Lugon-Moulin, 2002). S nn measures how often the nearest sequences are found in the same populations. Its value would be close to one when populations are highly structured and near .5 when populations are identical (Hudson, 2000). Therefore, K ST and F ST are approximately equal to S nn -.5. All three indices are higher than .15 in the current study (Table 2), suggesting a high differentiation of PVY sequences between the two populations. K ST is the most powerful index for detecting spatial population structure for recombination and high mutation populations (Hudson et al., 1992), which is the case in PVY (Blanchard, Rolland, Lacroix, Kerlan, & Jacquot, 2008).
Phylogenetic analyses provide additional support for differentiation between the Chinese and Japanese populations of PVY (Figure 2).
Sequences tend to cluster according to their geographic origin, consistent with previous results derived from more and wider geographic locations (Cuevas et al., 2012). Although some clades such as the O clade of the CP tree are composed of sequences from both locations, the hypothesis of random association between PVY sequences T A B L E 3 Tests of geography-phylogeny association for P1 and CP genes in PVY isolates originated from China and Japan .02* AI, association index; PS, parsimony score; MC, maximum monophyletic clade; HPD, highest probability density interval; n/a: no data available because of insufficient sample size (n < 2). Significance thresholds: *.01 < p < .05; **.001 < p < .01; ***p < .001.
from China and Japan and their geographic origins is rejected by all three statistics (AS, MC and PS) ( Table 3). Phylogenetic trees are reconstructed under the assumption of constant evolutionary rates and many evolutionary processes may affect the correct reconstruction of phylogenetic topology (Revell, Harmon, & Collar, 2008). The polyphyletic nature of these different clades could be generated by some sequences experiencing a complex evolutionary history such as recombination. For example, in the P1 tree, three Japanese sequences (NTNHIR3, NTNTK1, and NTNKGAM2) were clustered together with Chinese sequences in a subclade (Figure 2). It is likely they are recombinants that were not detected by the techniques used here. Our analyses suggest that the CP sequences of these three isolates are also most likely derived from recombination (Table S2).
Both stochastic and deterministic events could lead to the observed pattern of spatial structure (Caruso et al., 2011)  Ruby, and Shadow Queen (Kawakami, Oohori, & Tajima, 2015). None of these cultivars are used in China. In China, potatoes are grown over a much wider geographic area, ranging from a subtropical climatic zone in the south (e.g., Fujian) to a temperate continental climate zone in Northern China (e.g., Heilongjiang) using more diverse cultivars (Jansky, Jin, Xie, Xie, & Spooner, 2009), while in Japan, potatoes are mainly grown in the north (Kawakami et al., 2015).
Interestingly, the 1st codon translated to the cleavage site in the CP protein was detected to be under positive selection by both PAML and HyPhy with high confidence levels (PP > .99 or p < .05,  Fernandez et al., 2013). This process is catalyzed by proteases that are under constant change by mutation (Yu, Benton, Bovee, Sessions, & Lloyd, 1995). Positive selection in the CP cleavage site could serve as a reversal mechanism compensatory to mutations in the protease in PVY and other viruses (Banke et al., 2009).
Differences in spatial structure between the Chinese and Japanese sequences may also result from differences in the demographic dynamics of the PVY populations. Sudden increases or decreases in population size associated with demographic events can affect the generation, maintenance, and distribution of genetic variation, not only directly through genetic drift and mutation, but also indirectly through impacts on the efficiency of natural selection to remove or amplify mutations as well as on migration and recombination (Wang & Whitlock, 2003). Indeed, when we performed demographic analyses, we found that PVY populations in China were small but had undergone recent expansion, possibly associated with increasing potato production in the current years , while in Japan, they were large but stably maintained (Figure 4).
In summary, our study represents one of a few attempts to understand patterns and causes of spatial population genetic structure across political borders in PVY, a destructive pathogen of potato and many other Solanaceous crops. The finding of two subpopulations indicates distinct gene pools exist in PVY from China and Japan. This suggests that strict quarantine regulation is needed to prevent the movement of novel alleles or allelic combination of PVY between the two countries when trading plant materials that are hosts for this pathogen (e.g., potato and tobacco). However, sequences included in this analysis were relatively limited both in sample sizes and sites. In particular, the temporal scales might be relatively short for analysis of demographic events. Further study with larger, multiple-location, and longer temporal scale samples may be required to confirm the results and generalize the findings.  (top) and Japan (bottom). The y-axes represent a measure of relative genetic diversity, and x-axis is measured in calendar years