Genetic Variation Bias toward Noncoding Regions and Secreted Proteins in the Rice Blast Fungus Magnaporthe oryzae

Plants “lose” resistance toward pathogens shortly after their widespread emergence in the field because plant pathogens mutate and adapt rapidly under resistance selection. Thus, the rapid evolution of pathogens is a serious threat to plant health. Extensive studies have evaluated natural pathogen populations to understand their evolutionary effects; however, the study of the dynamic processes of the mutation and adaptation of plant pathogens to host plants remains limited. Here, by performing an experimental evolution study, we found a bias in genetic variation toward noncoding regions and SPs in the rice blast fungus Magnaporthe oryzae, which explains the ability of the rice blast fungus to maintain high virulence variation to overcome rice resistance in the field.

while some of them (called avirulence [Avr] genes) can be recognized by host R proteins and cause an avirulence phenotype (3). Most of the effectors identified so far are small secreted proteins (SPs) (4). Surveys of the sequenced genomes of filamentous fungi have indicated that fungal effector genes are not evenly distributed throughout the genome but are enriched in gene-sparse and repeat-rich two-speed genome regions where the genes undergo variation more rapidly than the genes in gene-rich compartments (2,5,6).
In the evolution of pathogens, mutations accumulate throughout the genome, while beneficial mutations gradually become fixed and dominant in the population. The fixation of beneficial mutations is highly dramatic and rapid in pathogens under constant selection, for instance, in a highly uniform agroecosystem (7,8). However, the mechanisms of the evolution of pathogens in the coevolution process and the fixation of beneficial mutations in a population under the constant selection of host plants are still largely unknown. Unraveling the dynamic variations in a plant pathogen's population while interacting with its host can enhance our understanding of pathogen evolution in nature, especially in agroecosystems, and thus provide efficient and environmentally friendly disease management strategies (9).
Many efforts have been made to uncover the dynamic variation in plant pathogen populations; among these efforts, experimental evolution has been well established in microbial population research and has become a powerful tool that is complementary to modern genetic and pathogen epidemiology research (10)(11)(12)(13)(14). Compared with population genetics, which ultimately provides information regarding evolution in the context of many undetectable environmental factors, experimental evolution methods can simulate the evolutionary process of organisms under controlled conditions and retain retraceable samples of different stages. The combination of well-developed and inexpensive sequencing technology enables the experimental evolution method to reveal molecular mechanisms under gradual adaptation and provide a real-time perspective of evolution dynamics.
Magnaporthe oryzae (syn. Pyricularia oryzae), the causative agent of rice blast disease, leads to a 10% to 30% reduction in rice production annually (15). Investigations have shown that Magnaporthe species are also capable of causing blast disease in more than 50 plant species of monocot origin in addition to rice, including food crops such as wheat, millet, and barley. In addition, Magnaporthe species also infect wild grass hosts such as Digitaria sanguinalis, Setaria viridis, and Eleusine indica (16). A previous study identified a novel avirulent gene, AvrPi9, in M. oryzae via the sequential planting method, which operates under a theory similar to that of experimental evolution (17,18). In a previous study, the experimental evolution of M. oryzae on artificial medium resulted in a rapid accumulation of mutations prior to the observation of a phenotype; however, the results of that study also revealed a significant reduction in the virulence of the progeny population, which may be due to the lack of an infection stage (19). The genome sequence of M. oryzae varied greatly when it adapted to hosts from different species and subspecies of rice (20,21). Although sequential inoculation in planta was performed using a mixture of M. oryzae isolates (17,18), no real-time evolution studies have been conducted in planta so far to investigate the role of either host selection or clonality in the rapid evolution of rice blast fungus.
Next-generation sequencing of pooled samples (Pool-Seq) has been used as a costand time-effective approach for studying population variability and differentiation (22). In this study, we performed an experimental evolution study by embracing the advantages of the experimental evolution method and high-throughput Pool-Seq technology to monitor the coevolution process by serially inoculating pathogens into host plants to evaluate the gradual variation in M. oryzae in rice. We found the rapid accumulation of low-frequency single-nucleotide variants (SNVs), insertions and deletions (indels), and TEs during infection, and interestingly, these mutations were enriched in intergenic regions and the proximal region of SP coding genes, whereas they were depleted in coding regions.

RESULTS
Experimental evolution via sequential infection with M. oryzae. To evaluate the host-induced adaptive evolution among succeeding generations of M. oryzae populations, the experimental evolution assay was conducted as shown in Fig. 1A. The wild-type strain Guy11 was grown on rice bran medium to generate spores that were adopted as the initial generation (G 0 [G stands for generation herein]). Seedlings of a susceptible rice cultivar (TP309) were spray inoculated with spores obtained from G 0 to commence the first infection cycle (infection cycle refers to 7 days postinoculation). In The wild-type strain Guy11 was grown on rice bran medium to generate spores that were adopted as the initial generation (G 0 ). Seedlings of the susceptible rice cultivar (TP309) were inoculated with spores obtained from G 0 to commence the first infection cycle. In total, 50 compatible leaf lesions from the first infection cycle were randomly collected, sterilized, and then incubated for 7 days on rice bran medium at 26°C under constant light to yield enough G 1 spores. The G 1 spores were used as the inoculum for the next infection cycle. A total of 11 infection cycles were carried out in this study to generate 11 generations of M. oryzae populations (G 1 to G 11 ). (B) Total number of SNVs/indels identified in G 1 , G 5 , and G 11 Pool-Seq data. (C) Venn diagram showing overlapping of SNVs/indels identified in G 1 , G 5 , and G 11 . (D) Percentage of different types (deletion, insertion, or SNV) of SNVs/indels identified in G 1 , G 5 , and G 11 . (E) Box plot showing frequency of all SNVs/indels identified in G 1 , G 5 , and G 11 . (F) Density plot showing density distribution of SNVs/indels identified in G 1 , G 5 , and G 11 . The red line represents the G 1 population, the blue line represents the G 5 population, and the brown line represents the G 11 population. total, 50 compatible leaf lesions from multiple plants of the first infection cycle were randomly collected, sterilized, and then incubated for 7 days on rice bran medium at 26°C under constant light to yield enough G 1 spores. The G 1 spores were used as the inoculum for the next infection cycle. A total of 11 infection cycles were carried out in this study to generate 11 generations of M. oryzae populations (G 1 to G 11 ).
High-throughput pool sequencing of experimental evolution populations. To investigate the variation in M. oryzae that occurred under constant host selection, whole-genome sequencing was performed using Illumina paired-end sequencing. During the evolution process, mutations occurred in different sites in each individual; however, only beneficial mutations facilitating host adaptation gradually accumulated in the population. We thus selected G 1 , G 5 , and G 11 for in-depth whole-genome sequencing. To minimize the difference between the reference genome and the original isolate (G 0 ) used in this study, the original isolate was also sequenced and used as a control in this study. To investigate mutations at the population level, we used pool sequencing (Pool-Seq), which is a cost-effective method that pools sequences of individual DNAs. For G 1 , G 5 , and G 11 , we randomly selected 50 individual isolates for each generation and combined their DNA in equal amounts for Pool-Seq.
The obtained reads of G 0 , G 1 , G 5 , and G 11 were mapped to the genome sequence of the M. oryzae 70-15 reference genome for single-nucleotide variant (SNV) and insertion and deletion (indel) analyses (see Table S1 in the supplemental material). We performed SNV/indel calling separately for all four samples, and the obtained SNVs/ indels were filtered with the SNVs/indels of G 0 . Overall, we obtained 579, 786, and 953 SNVs/indels for G 1 , G 5 , and G 11 , respectively (Fig. 1B). Further analysis of overlapping SNVs/indels in G 1 , G 5 , and G 11 showed that 206 SNVs/indels were shared by the three generations and 242, 364, 496 SNVs/indels were uniquely present in G 1 , G 5 , and G 11 , respectively (Fig. 1C). To evaluate variation sites at the population level, we analyzed the SNV/indel frequency of each site in G 1 , G 5 and G 11 (Fig. 1F). The results showed that G 5 and G 11 had similar average SNV/indel frequencies ( Fig. 1D and E), while G 1 had more low-frequency sites than G 5 and G 11 .
Genomic distribution of SNVs/indels. To evaluate the impact of constant selection on the pathogen's genome, we calculated the SNV/indel number in every 10-kb region and observed that the three samples exhibited significant differences in SNV/indel distribution pattern ( Fig. 2A). We found the most variation in chromosome 1 (Chr 1), Chr 3, Chr 6, and Chr 7 in the G 1 , G 5 , and G 11 genomes, and the number of SNVs/indels increased consistently in some regions. Consistent with our previous findings that G 1 was more heterozygous and had more low-frequency sites than G 5 and G 11 , the distribution of SNVs/indels in G 1 was more diversified than that in G 5 and G 11 .
We then questioned whether these SNVs/indels had any bias in genome distribution. To answer this question, we annotated the genomic distribution of SNVs/indels by dividing the genome into five categories: intergenic, promoter (Ϫ1,000 to 0 bp), exon, intron, and 5= and 3= untranslated regions (UTRs). We calculated the expected numbers of SNVs/indels in these five categories by randomly selecting an equal number of control sites in the genome and comparing it with the SNVs/indels of G 1 , G 5 , and G 11 (observed number). Interestingly, we found that the ratios of observed number/ expected number of SNVs/indels were much higher in intergenic (2.58, 2.24, and 2.32) and intron regions (3.00, 3.14, and 4.09) than in exon regions (1.36, 1.01, and 1.03) in G 1 , G 5 and G 11 , respectively, suggesting that SNVs/indels were preferentially enriched in intergenic and intron regions than in exon regions (Fig. 2B). In addition, the ratios of observed number/expected number of SNVs/indels in the promoter and exon were close to 1, suggesting no bias in distribution in these regions.
To determine the effects of SNVs/indels on genes, transcripts, protein sequences, and regulatory regions, we next investigated the effect of these SNVs/indels on gene products. Consistent with the SNV/indel distribution results, most of the SNVs/indels that caused upstream or downstream gene variations had no significant effect on gene products. Some variation types identified in this study lead to amino acid changes, including frameshift variants, in-frame deletions, in-frame insertions, missense variants, start codon losses, stop codon losses, and stop codon gains (Table 1). We found that SNVs/indels with these variations occurred in 193 genes, among which 20 were shared by 3 samples and 38, 50, and 62 were presented uniquely in G 1 , G 5 , and G 11 , respectively (Fig. 2C). Functional annotation indicated multiple functions of these 193 genes, involved in pathways ranging from the metabolic pathway and transcription regulation to the mating signal transduction pathway (Table S2). Overall, although SNVs/indels accumulated rapidly in the genome, it appeared that coding regions were not a hot spot of variation.
Dynamics of TEs in experimental evolution populations. Transposable elements (TEs) play important roles in genome shaping and genome stability in M. oryzae (23,24). Plant pathogens can avoid plant host immunity recognition through TE insertionmediated silencing of some avirulent genes (25,26). We thus set out to analyze TE dynamics in experimental evolution populations. We first annotated newly formed TE insertion events in G 1 , G 5 , and G 11 by annotating all TE insertion events in the genome and filtering them with G 0 . We identified 790, 879, and 798 newly formed TE insertion events in G 1 , G 5 , and G 11 , respectively (Fig. 3A). A Venn diagram showing the overlapping TEs in the three samples indicated that more than 50% of TEs were shared among these samples and that 186 TEs were uniquely present in G 11 (Fig. 3B). The insertion of TEs at the promoter region results in gene silencing; thus, we investigated the relationship between the TE insertion site and the coding gene. We found that, as observed in the SNV/indel analysis, new TEs were preferentially enriched in intergenic regions and depleted in exon regions (Fig. 3C). To determine the activity of these TEs, we calculated the copy number of newly formed TEs in G 1 (n ϭ 790), G 5 (n ϭ 879), and G 11 (n ϭ 798). In general, long-terminal-repeat (LTR) retrotransposons (Maggy, RETRO5, RETRO6, RETRO7, Pyret, and MGLR3) are more active than DNA transposons (POT2, POT3, and Occan). We found that among the LTR retrotransposons, Pyret was the most active TE, followed by long interspersed elements and MGL (Fig. 3D). Compared with POT3 and Occan, POT2 was the most active DNA transposon. In summary, extensive variation in TE copy number was observed in the process of successive host interaction, and TEs were preferentially enriched in intergenic regions and depleted in exon regions. TE insertion in genes encoding SPs. Secreted proteins (SPs) play dual functions in plant-pathogen interactions (27,28). They can facilitate pathogen infection by overcoming the plant defense response, such as suppressing the plant immune system or hijacking host metabolism (29). Some SP genes (Avr genes) can be recognized by plant R gene products to induce plant immunity. Previous studies indicated that this group of genes is a hot spot for SNV variation and TE insertion because these variations can help pathogens avoid plant recognition (17,25,26). However, so far, there has been no direct evidence to support the preferential insertion of TEs in SP genes. Therefore, we investigated the percentage of SP genes among genes that have newly formed TE insertions in the promoter, exon, intron, or transcription termination site (TTS) regions. The abundance of SP genes in the whole genome was 11.84% (1,539/12,991), which was similar to that of SNV/indel-associated genes (10.69%), the control set of SNV/ indel-associated genes (11.15%), and the control set of TE-associated genes (10.33%); however, the abundance of SP genes in the whole genome was significantly (P Ͻ 0.05) lower than that of TE-associated SP genes (16.32%), especially at promoter and intron regions ( Fig. 4A; Fig. S1A and B). This result provides direct evidence to support the hypothesis that TEs tend to insert into SP genes.

DISCUSSION
Genome plasticity allows the rapid evolution of plant pathogens and enables plant pathogens to overcome host resistance quickly, which makes the control of fungal diseases challenging. The filamentous fungus M. oryzae is the causal pathogen of rice blast disease, which leads to substantial losses in rice production annually (15,30). Our previous studies suggested that the host plant is the major force that shapes the M. oryzae genome (20,21,31). However, no real-time assays have been performed so far to investigate the occurrence and accumulation of mutations in M. oryzae populations in interaction with rice plants. To this end, we performed the in planta experimental evolution of M. oryzae by monitoring the genomic variation in serially subinoculated populations. During adaptation to an alternative host, pathogens are thought to maintain a balance between generalism and specialization and finally to reach a suboptimal fitness (32,33). In this study, by monitoring the genomic variation in a population of progeny derived from sequentially inoculating M. oryzae, we found that G 1 presented a vast number of variations and that a large number of variations were shared by all samples. These results suggested that genomic variations occurred rapidly within a very short period of time and accumulated very rapidly in the genome and that G 1 has a founder effect on the sequential inoculation population. Because TP309 is not the original host of the Guy11 strain, it may possess immunity to Guy11, in contrast to its original host plant. We therefore proposed that the fungal genome may have been subjected to strong selection in response to plant immunity at the beginning of the experiment and reached a bottleneck when the strain showed optimal fitness under continuous selection in this plant variety. Similar results were shown by Jeon et al. (19). However, the host plant may impose a much more specific and efficient selection on the fungal genome than does oatmeal medium. In the future, using strains to inoculate their original host plants might further illustrate whether such bottleneck effects are common during adaptive evolution. It is also worth noting that although the medium cultivation process has been minimized and mimics the natural rice environments by using rice bran medium to exclude most of the selection stress on the isolates in this study, the medium cultivation process may still result in some stress on the population that is different from host selection, resulting in some abiotic stress variations in the M. oryzae genome. Development of advanced sequencing technologies, such as single-cell sequencing, may provide resolution to waive the artificial effects caused by medium culture. Taking these results together, we propose that M. oryzae maintained a balance between population diversity and bottleneck effects while adapting to its host plant. Notably, genetic variation does not neutrally accumulate in the genome during adaptive evolution. We observed that increased mutations accumulated in Chr 1, Chr 3, Chr 6, and Chr 7, and this result is consistent with that of a previous experimental evolution study (19). It is possible that there are mutation hot spots in these chromosomes, or alternatively, these chromosomes might buffer more mutations and undergo stronger selection. Interestingly, more SNV/indel mutations were found in noncoding regions than in coding regions, as shown previously by Jeon et al. (19). The different mutation rates in the noncoding (intergenic) and coding (exon) regions can at least partially explain the accumulation of variations in M. oryzae in a short period of time despite M. oryzae rarely showing phenotypic variations. We proposed that this bias indicates a purifying selection on coding regions, especially some important housekeeping genes, to maintain genome stability. To search for genes that are involved in the adaptive process, we annotated the genomic consequences of these variations and found that these genes were involved in multiple pathways.
Previous studies have indicated that transposable elements (TEs) play important roles in host-pathogen interactions. TEs are major components of facultative heterochromatin regions that provide not only epigenetic regulation of the transcription of effector/secreted protein (SP) genes but also a cradle for rapid adaptive evolution (6,34) and can induce total silencing of Avr genes and thus help pathogens avoid host recognition (25,26,35). Our results showed that G 1 also has a founder effect on newly formed TE insertion, although the number of new TE insertions gradually increased in G 5 and G 11 . These results suggested that the transposability of TEs in the G 1 fungal genome is the highest. In line with the bias mutation rate of SNVs/indels in the intergenic and exon regions, TE insertions were also enriched in the intergenic region and depleted in the exon region. In addition, we found that Pyret was the most active TE. This result is supported by a previous study, which also found that Pyret was the most active TE under different stresses (36). The pathogen effector/SP genes play pivotal roles during the interaction with their host, especially for biotrophic or hemibiotrophic pathogens (27,28). These genes are enriched preferentially in repeat-rich regions and are thought to be evolved more rapidly than genes in other genomic compartments (17,25,26). Moreover, most of these genes are functionally redundant (37) and could tolerate more mutations, which provides genetic diversity for adaptive evolution on diverse hosts. In M. oryzae, an increasing amount of evidence has demonstrated that SP genes, especially the Avr genes, tend to evolve more rapidly than other genes. For instance, comparative genomic analysis revealed that the SP genes presented a high level of diversity among different strains (38)(39)(40). In addition, some of the SP genes (Avr genes) have been suggested to be hot spots for SNV variation and TE insertion as a tool for avoiding plant recognition (17,25,26,41). Consistent with this hypothesis, we found that in addition to the intergenic region, TEs also showed an insertion bias toward genes encoding SPs. These results suggested that SP genes are very important for adaptive evolution toward diverse rice hosts (different subspecies/cultivars) and that TEs are one of the major forces fueling the rapid evolution of M. oryzae. The high bias toward TE insertion into SP genes indicated that TE insertion at the Avr gene may not be neutral.
In conclusion, we applied an experimental evolution assay to M. oryzae and identified the preferential accumulation of SNVs/indels and TEs in noncoding regions and SP genes (Fig. 4B). Further studies on the genetic function of the mutated genes identified in this study will provide more insights into the adaptive evolution of M. oryzae. We believe that the results obtained through experimental evolution may also enhance our understanding of pathogen evolution in nature and explain the mechanism of the vulnerability of rice resistance.

MATERIALS AND METHODS
Evolution experiment assay. Magnathorpe oryzae strain Guy11, isolated from French Guyana (42), was used as the initial strain in this study. Single spores isolated from diseased rice (TP309, a cultivar susceptible to Guy11) leaves 7 days postinoculation (dpi) of each generation were grown on rice bran medium (2% rice polish and 1.5% agar at pH 6.5) at 26°C under constant light for conidiation. Small pieces of sterilized filter paper were also put on the medium and collected when it was colonized by the fungus. The fungal cultures on the dried filter paper were then stored in a -80°C refrigerator. Genomic DNA samples were extracted using the CTAB extraction method from mycelia cultured in liquid CM medium (0.6% yeast extract, 0.6% casein hydrolysate, 1% sucrose, and 1.5% agar) with 130 rpm shaking at 26°C for 3 to 4 days. Detailed steps for the experimental evolution assay are described in Results and in the legend for Fig. 1A.
Genome sequencing. DNA samples from individual isolates were combined in equal amounts for sequencing. DNA samples were sheared to ϳ350 bp in average size. Sequencing libraries were prepared using the Illumina paired-end DNA sample prep kit and sequenced on an Illumina HiSeq 2500 platform with 2 ϫ 150-bp paired-end reads.
Read alignment and SNV/indel calling and annotation. SNV/indel calling was performed according to a previously described method with some modifications (43,44). Briefly, all sequenced reads were aligned to the M. oryzae 70-15 reference genome with Bowtie2 with the default parameters (45). The resulting bam files were subjected to Picard MarkDuplicates function to remove PCR duplicates (https:// broadinstitute.github.io/picard). PCR-duplicated reads were removed, and the remaining files were then subjected to variant calling. Additionally, mapping reads with a mapping quality (MAPQ) value of Ͼ40 were retained for variant calling. CRISP has been used for variant calling with a minimum of 10 reads with alternate alleles (46). The SNV/indel density was calculated by VCFtools (v0.1.15) in every 10-kb window and visualized with circlize (47,48). The SNV/indel control data set was randomly selected from the whole genome with BEDTools (v2.21.0) (49). The genomic distribution of the SNVs/indels was annotated with ChIPseeker (v1.20.0) (50). The types of consequences of the SNVs/indels were predicted by VEP (v96) and the Magnaporthe 70-15 reference genome version 43 from EnsembleFungi (https://fungi.ensembl.org/ Magnaporthe_oryzae/Info/Index) (51).
TE annotation and SP prediction. RepeatMasker (version 3.3.0; http://www.repeatmasker.org/) was used to search for TEs in 70-15 (52). The presence and absence of TE polymorphisms were detected by PoPoolationTE2 with the 70-15 sequence as the reference genome and at least 3 reads of support (53). TE insertions supported by at least 3 reads in G 0 were maintained for TE filtering in the G 1 , G 5 , and G 11 samples. The genomic distribution of TE insertions was annotated with the annotatePeaks function in Homer2 (v4.8.3) (54). Peaks within 50 bp between two samples were merged into the same insertion by BEDTools (v2.21.0) (49). The TE control data set was randomly selected from the whole genome with BEDTools (v2.21.0) (49). SPs were defined as proteins containing a signal peptide cleavage site, no transmembrane domain after the signal peptide cleavage site, and an amino acid sequence length smaller than 400 amino acids. SignalP 5.0 was used to predict signal peptides, and TMHMM 2.0 was used to predict transmembrane domains (55,56).
Statistical analysis. All statistical analyses were performed using Student's t test function in R (57). Data availability. All genomic sequencing data were deposited in the NCBI Sequence BioProject database under accession number PRJNA577277.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. FIG S1, PDF file, 0.4 MB.

ACKNOWLEDGMENTS
We thank the members of the Z.W. laboratory for useful discussions. This work was supported by grants from the Natural Science Foundations of China (U1805232, 31770156) and the National Key R&D Program of China (2016YFD0300700) to Z.W.
Z.W. and Z.Z. conceived the work, designed the experiments, and wrote the manuscript. H.Z. and J.N. contributed to manuscript writing. M.C. and D.L. provided experiment assistance. L.L. and R.C. provided support for bioinformatics analyses. All authors read and approved the manuscript.