Genomic adaptation to polyphagy and insecticides in a major East Asian noctuid pest

The tobacco cutworm, Spodoptera litura, is among the most widespread and destructive agricultural pests, feeding on over 100 crops throughout tropical and subtropical Asia. By genome sequencing, physical mapping and transcriptome analysis, we found that the gene families encoding receptors for bitter or toxic substances and detoxification enzymes, such as cytochrome P450, carboxylesterase and glutathione-S-transferase, were massively expanded in this polyphagous species, enabling its extraordinary ability to detect and detoxify many plant secondary compounds. Larval exposure to insecticidal toxins induced expression of detoxification genes, and knockdown of representative genes using short interfering RNA (siRNA) reduced larval survival, consistent with their contribution to the insect’s natural pesticide tolerance. A population genetics study indicated that this species expanded throughout southeast Asia by migrating along a South India–South China–Japan axis, adapting to wide-ranging ecological conditions with diverse host plants and insecticides, surviving and adapting with the aid of its expanded detoxification systems. The findings of this study will enable the development of new pest management strategies for the control of major agricultural pests such as S. litura. Genome of the tobacco cutworm, Spodoptera litura, which is one of the most widespread and destructive agricultural pests in tropical and subtropical Asia.

about 104.7 Myr ago (Ma), and diverged approximately 147 Ma from the more basal Plutella xylostella, whereas Lepidoptera as a whole separated from Diptera about 258 Ma, consistent with reported divergence time estimates 10 (Supplementary Fig. 1b). To construct a linkage map, a heterozygous male F1 backcross (BC1) population was established between Japanese and Indian inbred strains. The resulting genetic analysis used 6088 RAD-tags as markers to anchor 639 scaffolds covering 380. 89 Mb onto 31 chromosomes, which corresponded to 87% of the genome (Supplementary Section 2). Genomic syntenies from S. litura to B. mori and to Heliconius melpomene revealed two modes of chromosomal fusion (Supplementary Tables 10 and 11 and Supplementary Fig. 2). In one, six S. litura chromosomes (haploid chromosome number N = 31) were fused to form three B. mori chromosomes (N = 28). In the other, six sets of S. litura chromosomes were fused, corresponding to six H. melpomene chromosomes (N = 21) 11 , and another eight S. litura chromosomes were fused, corresponding to four other H. melpomene chromosomes. These changes were consistent with previous reports on chromosome evolution among butterflies including Melitaea cinxia 12 and the moth Manduca sexta 13

(Supplementary Section 2).
Massive expansion of bitter gustatory receptor and detoxification-related gene families associated with polyphagy of Noctuidae. To elucidate key genome changes associated with host plant specialization and adaptation in Lepidoptera, we compared chemosensory and detoxification-related gene families between the extremely polyphagous lepidopteran pest S. litura and the almost monophagous lepidopteran model organism B. mori. We found large expansions of the gustatory receptor (GR), cytochrome P450 (P450), carboxylesterase (COE) and glutathione-S-transferase (GST) gene families in S. litura (Table 1). Chemosensory genes play an essential role in host plant recognition of herbivores. GRs, especially, are highly variable among species, which could be a major factor for host plant adaptation. GRs are categorized into three classes-CO 2 receptors, sugar receptors and bitter receptors-among which bitter receptors are most variable, while CO 2 and sugar receptors are conserved [14][15][16][17][18] . Manual annotation identified 237 GR genes in the S. litura genome ( Table 2, Fig. 1a and  Supplementary Table 13), whereas in the other lepidopteran species investigated to date, most of which are mono-and oligophagous, only about 45-80 GRs are reported 8,11,14,16,19,20 . Since large expansions of GR genes were also reported recently in S. frugiperda 5 and in another polyphagous noctuid, Helicoverpa armigera 21 , the expansion of GRs may be a unique adaptation mechanism for polyphagous Noctuidae to feed on a wide variety of host plants ( Table 2). Phylogenetic analysis including GR genes of B. mori, M. sexta, H. melpomene and S. frugiperda showed clearly that greatly expanded bitter GR clades were composed of SlituGRs and SfruGRs exclusively ( Supplementary Fig. 3), supporting a strong association of a major expansion of bitter receptor genes with the appearance of polyphagy in the Noctuidae. GR expansions mainly occurred by duplications, as many structurally similar GR genes are located in clusters on the same scaffold/chromosome (for example, Chr 12, 14 and 25; Fig. 1a-c). Interestingly, while many H. armigera GR genes have been identified as intronless 21 , especially in the bitter GR clade, here we found that almost all S. litura GR genes possessed introns. This suggests that different mechanisms led to GR expansion in these two species.
Transcriptome and phylogenetic analyses of expanded bitter GR genes in S. litura. Transcriptome analysis revealed that at least 109 of the predicted bitter GR genes were expressed, mostly in larval palps and adult proboscis, but a large number were also expressed in other chemoreception organs such as antennae, legs and the pheromone gland (Fig. 1d). These observations are similar to GR expression patterns reported in adult tissues of H. melpomene 14 and in diverse developmental stages and tissues in H. armigera 21 . Intriguingly, four bitter GR genes on Chr 25 and 14 bitter GR genes on Chr 14 were mainly expressed in moth proboscis (Fig. 1d), which S. litura uses to suck flower nectar to obtain energy for flying. Comparison with the silkmoth, which does not feed, showed that the expansion of these gene clusters could represent an adaptation to detect toxic plant secondary metabolites present in flower nectar (Fig. 1b,c). From our phylogenetic analysis ( Supplementary  Fig. 3), expansion of the biggest cluster of bitter GR genes on Chr 12 was Spodoptera-specific. These genes were mainly expressed in larval maxilla, consistent with the idea that a large expansion of bitter

Nature ecology & evolutioN
GR genes supports the polyphagy of Spodoptera and an ability to detect a large number of toxic metabolites in host plants (Fig. 1d). The mechanisms by which perception of bitter substances result in specific behaviours are complex, and those underlying bitter receptor function in Lepidoptera have not yet been elucidated.
Association of major expansions of SlituP450 genes with intensified detoxification. Detoxification of xenobiotics is crucial for ecological adaptation of highly polyphagous pest species to different host plants. This process usually involves several distinct detoxification pathways, from active metabolism of toxins 22 to enhanced excretion activity by ABC transporters 23,24 . We annotated 138 P450 genes in the S. litura genome, among which P450 clans 3 and 4 showed large expansions (Fig. 2a, Supplementary  Fig. 4 and Supplementary Table 14). CYP9a especially was greatly expanded on S. litura Chr 29 compared to the corresponding chromosome of B. mori (Fig. 2a, upper panel). Transcriptome analysis showed that some of the expanded S. litura CYP9a genes were inducible by treatment with xanthotoxin, imidacloprid or ricin (P450-100, 103 and 105; Fig. 2a, middle panel). CYP9a is reported to be inducible by xanthotoxin in S. litura 25 and S. exigua 26 . Other P450 clan 3 expansions (CYP337a1 and a2, CYP6ae9 and CYP6b29, and CYP321b1) were also induced by the toxin treatments ( Supplementary Fig. 5a), suggesting a link between P450 clan 3 expansions and an increase of tolerance to toxin in this pest. To test this hypothesis, we selected P450- 74,88,92 and 98 as members of P450 clan 3 for knockdown experiments. We injected each siRNA of the corresponding P450 into fifth-instar larvae. After feeding with an artificial diet containing imidacloprid, we observed an increase in sensitivity to the insecticide in the treated larvae compared to controls (Supplementary Fig. 7a-d). Recently, the role of SlituCYP321b1 in insecticide resistance was confirmed by showing that it is overexpressed in the midgut after induction by several pesticides, and that RNAimediated silencing of SlituCYP321b1 significantly increased mortality of S. litura larvae exposed to the same pesticides 27 .  CO2 receptor  3  3  5  3  3  3  3   Sugar receptor  8  9  11  13  10  10  11   Bitter receptor  65  33  53  57  184  213  223   Total GRs  76  45  69  73  197  232

Nature ecology & evolutioN
Major expansions of SlituGST genes enhance insecticide tolerance of this pest. Expansions of SlituGST genes were derived from epsilon classes on Chr 9 and Chr 14; the expression of these genes was also induced by toxin treatment (Fig. 3a-c and Supplementary Table 16). We chose SlituGST07 and SlituGST20 as representatives of the expanded clusters on Chr 14 and Chr 9, respectively, for knockdown and imidacloprid pesticide binding assays. We injected the siRNAs into fifth-instar larvae, then fed them an artificial diet containing imidacloprid (50 µ g g −1 ). This treatment resulted in lethality in siRNA-injected larvae, while controls remained alive (Fig. 3d,e), consistent with the idea that expansion of the GSTε class conferred an increase in detoxification ability. Figure 3f,g shows the inhibitory effects of imidacloprid on SlituGST07 and SlituGST20 in a competitive binding assay (Supplementary Section 6). These observations confirmed that expansion of GSTε contributes to the detoxification ability of this pest.

Nature ecology & evolutioN
Associating large expansions of SlituCOE genes with intensified detoxification. COE genes, which play an important role in the metabolism of a wide range of xenobiotics associated with plants and insecticides 22,[28][29][30] , also showed large expansions of lepidopteran and α classes ( Table 1, Supplementary Fig. 6a and Supplementary  Table 15). RNA-Seq analysis showed that the expanded COE genes were inducible with toxin treatment, suggesting again that their expansion is linked to an increase in detoxification ability (Fig. 2b, lower panel). These results supported knockdown experiments for COE-57 and COE-58 whereby injected larvae fed with an artificial diet containing imidacloprid showed a 60-80% increase in sensitivity compared to controls ( Supplementary Fig. 7e,f). Taken together with our knockdown experiments, transcript induction by imidacloprid indicates that expansion of the P450, GST and COE families is linked to tolerance of this insecticide.
Roles of non-expanded detoxification gene families. Although the APN and ABC gene families did not exhibit significant expansion, they were highly induced by ricin treatment (Supplementary Figs. 8  and 9 and Supplementary Tables 17 and 18). APN 31 , ABCC2 32 and ABCA2 33 have been shown to function as Cry protein receptors 32,33 (see Supplementary Sections 7 and 8). Thus, APN and ABC transport proteins may be involved in the response to different classes of xenobiotics. Altogether our results suggest that S. litura probably achieves its impressive polyphagy by adopting a strategy of large expansions of diverse sensory and detoxification-related genes, with probable cross-talk in their regulation, to adapt to a great variety of host plants.
Genetic population structure reveals extensive long-distance migration of this pest. We analysed the genetic diversity and gene flow of S. litura sampled from 3 locations in India, 11 locations in China and 2 locations in Japan (Supplementary Table 21). This yielded a clear geographical map of the genetic diversity of the surveyed local populations and genetic population structure in these countries. We observed extremely high genetic similarity between S li t u G S T 0 4 G S T d 3 _ G e n e 0 0 3 6 6 7 9 9 S lit u G S T 0 3 GS Td 2_G en e0 03 66   S lit u G S T 12 8 1 Sli tuG ST 13 GSTe4_Gen e004044 64   Imid-fb   Fig. 2a,b. d,e, Increased sensitivity to the pesticide imidacloprid caused by knockdown of GST07 and GST20. d, Knockdown of fifth-instar larvae, performed by siRNA injection and confirmed by RT-qPCR. At 24 hr after siRNA injection, larvae were fed an artificial diet containing imidacloprid (50 µ g g −1 ). The percentage of larvae affected by imidacloprid (dead and almost dead; see Methods) is shown. Ten larvae were used per experiment in three independent replicates and the results are presented with the standard deviation (SD). e, Knockdown reduction rates of GST07 and GST20 (31% and 57%, respectively). Control larvae were injected with siGFP (see Methods).
The relative expression is shown as mean + SD of three independent replicates of 10 larvae each, using a Student's t-test, *P< 0.05, **P< 0.01. f,g, Binding assay of SlituGST07 (f) and SlituGST20 (g) with imidacloprid. The inhibitory effect of imidacloprid on SlituGST07 and SlituGST20 was determined using CDNB and GSH as substrates (see Methods). Enzymatic activity of SlituGST07 and SlituGST20 was measured in the presence of various concentrations of imidacloprid. The value from the assay with 1 × 10 −4 mM of imidacloprid was set to 100%. Error bars denote SEM from 3 independent experiments (10 larvae per treatment).

Nature ecology & evolutioN
Hyderabad (central India), Fujian (the southeast coast of mainland China) and Okinawa/Tsukuba (Japan) (F ST < 0.01, Fig. 4a and Supplementary Table 23). The model-based structure analysis 34 provided a predicted population structure consistent with an F STbased cluster analysis ( Fig. 4b and Supplementary Fig. 10a,b). By incorporating the estimated allele frequency divergence between the ancestral populations, we obtained a very stable picture of population structure relative to the assumed number of ancestral populations (K). Here, again, we observed extremely high genetic similarity between central India (Hyderabad and Matsyapuri), the southeast coast of mainland China (Zhejiang, Guangzhou and Fujian) and Japan (Okinawa and Tsukuba). The assignment of individual genomes to the ancestral populations provided a detailed picture of the gene flow (Fig. 4b). These results are consistent with the study of DNA sequence variation among populations of S. litura in China and Korea 35 . An additional factor affecting population dispersal is oversea migration from southern China to western Japan driven by typhoons 36,37 . Geographical data on the Asian monsoon in July-August 38 may support our results, enabling S. litura to undertake a trip of even longer distance from southern India to China and Japan.
To understand the global pattern of migration routes, we analysed the joint allele frequency spectrums (Fig. 4c) by ∂a∂i (diffusion approximation for demographic inference) 39 . ∂a∂i fits the solution of the Fokker-Planck-Kolmogorov equation to the data of the joint allele frequency spectrum, and the estimated values of the coefficients provide direct information on the population histories and migration rates. Based on the F ST -based population structure and the model-based assignment of the individual genomes, we constructed six population groups: two groups in India (India_local and India_migrate), three in China (China_isolate, China_local, and China_migrate), and one in Japan. By applying the isolation with migration model 40 to each of the pairs of population groups, we identified a global route from the Indian migrating population through the Chinese local population, which ranges from the south at Hainan to the north at Hubei (Fig. 4d). This Chinese local population has a large number of migrants to and from the Chinese migrating population. We observed moderate numbers of migrants from China to Japan and from China to India. ∂a∂i also implied that the local populations in India and China have been shrinking significantly for the past 2000-3000 years. In contrast, the Japanese population has been expanding for the past 5000 years (Supplementary  Figs. 11 and 12 and Supplementary Table 24). It would be of interest to investigate the extent to which these local populations are also pests and have insecticide resistance.

Conclusion
This study provides strong evidence on how this polyphagous insect has evolved to become a deleterious and powerful global   Supplementary Fig. 10) suggest that one of the individuals from the Hunan2 sample belongs to a migration population (Hunan2-4), while the other 3 individuals belong to a local population (Hunan2-1, Hunan2-2 and Hunan2-3). Here, we treated the Hunan2 samples as a mixed population with both migration and local populations. b, Assignment of the individual genomes in the samples to the ancestral populations predicted by structure. We obtained the predicted allele frequency divergences between individual genomes by the predicted allele frequency divergence between the ancestral populations and the membership coefficients of the individual genomes (see Methods). c, Two-dimensional allele frequency spectra in the paired population groups. d, Global picture of the migration route predicted by ∂ a∂ i. The inset shows the number of migrating chromosomes per generation. The four closed ropes represent the migrating population in India, local populations in China, migrating populations in China, and the populations in Japan. The size of the circles represents the genetic diversity (π ).

Nature ecology & evolutioN
pest through adaptative changes and subsequent selection of gene expansions. It also provides an explanation for the genetic basis for its high tolerance to pesticides, which involves mechanisms similar to plant allelochemical detoxification. The population genetic analysis revealed the extensive migratory ability of S. litura. Such a deeper understanding through genomics and transcriptomics will enable us to develop novel pest management strategies for the control of major agricultural pests like S. litura and its near relatives, and to design new classes of insecticide molecules.

Methods
Genome sequencing and assembly. An inbred strain of S. litura (the Ishihara strain) was developed by successive single-pair sib matings for 24 generations and reared on an artificial diet at 25 °C. Male moths were used to extract genomic DNA for sequencing. Shotgun libraries with insert sizes of 170, 300, 500 and 800 bp (short insert sizes) and 2, 5 and 10 kb (large insert sizes) were constructed by following the manufacturer's protocol (http://www.illumina.com). After quality control of DNA libraries, ssDNA fragments were hybridized and amplified to form clusters on flow cells. Paired-end sequencing was performed following the standard Illumina protocol.
The S. litura genome was assembled using the software program ALLPATHS-LG build 47758 41 . The assembly used default parameters with the exception of using a ploidy setting of 2 (PLOIDY = 2), as recommended for a diploid organism, in the data preparation stage, and a minimum contig size set to 200 bp (MIN_CONTIG = 200) in the running stage (running the RunAllPathsLG command). Gaps within the scaffolds were filled based on the short insert size libraries, using the GapCloser in the SOAPdenovo package 42 . Assembled scaffolds were assigned to chromosomes by the order and orientation of a linkage map combined with a synteny analysis between S. litura and B. mori. The sequencing depth and GC content distribution of the assembled genome sequence were evaluated by mapping the short insert size reads back to the scaffolds using SOAP2 43 .
Genome annotation. Three methods were used for S. litura gene prediction including ab initio, homology-based and transcript-based methods; the GLEAN program 6 was used to derive consensus gene predictions. For ab initio prediction, AUGUSTUS 44 and SNAP 45 were used to predict protein-coding genes. For homology-based prediction, proteins from five insect genomes (Anopheles gambiae, Drosophila melanogaster, B. mori, Acyrthosiphon pisum and D. plexippus) were first mapped to the S. litura genome using TBLASTN (E-value ≤ 0.00001), and then accurate splicing patterns were built with GeneWise (version 2.0) 46 . In the transcript-based method, the assembled transcriptome results were mapped onto the genome by BLAT with identity ≥ 99% and coverage ≥ 95%. We used TopHat to identify exon-intron splice junctions and refine the alignment of the RNA-Seq reads to the genome 47 , and Cufflinks (version 1.2.0 release) to define a final set of predicted genes 48 . Finally, we integrated the three kinds of gene predictions to produce a comprehensive and non-redundant reference gene set using GLEAN. Gene function information was assigned based on the best hits derived from the alignments to proteins annotated in the SwissProt, TrEMBL 49 and KEGG 50 databases using BLASTP 51 . Motifs and domains of proteins were annotated using InterPro 52 by searching public databases, including Pfam, PRINTS, PROSITE, ProDom and SMART. We also described gene functions using Gene Ontology (GO) 53 .
Repeats and transposable element families in the S. litura genome were first detected by the RepeatModeler (version open-1.0.7) pipeline, with rmblast-2.2.28 as a search engine. With the assistance of RECON 54 and RepeatScout 55 , the pipeline employs complementary computational methods to build and classify consensus models of putative repeats. tRNAs were annotated by tRNAscan-SE with default parameters. rRNAs were annotated by RNAmmer prediction and homology-based search of published rRNA sequences in insects (deposited in the Rfam database). snRNAs and miRNAs were sought using a two-step method: after aligning with BLAST, INFERNAL was used to search for putative sequences in the Rfam database (release 9.1).
Gene family clustering and phylogenetic tree construction. Protein sequences longer than 30 amino acids were collected from nine sequenced arthropod species (B. mori, P. xylostella, D. plexippus, D. melanogaster, A. darlingi, Apis mellifera, Harpegnathos saltator, Tribolium castaneum and Tetranychus urticae) and S. litura for gene family clustering using Treefam 56 . We aligned all-to-all using BLASTP with an E-value cut-off of 0.0000001, and assigned a connection (edge) between two nodes (genes) if more than a third of a region was aligned in both genes. An H-score ranging from 0 to 100 was used to weigh the similarity (edge). For two genes, G 1 and G 2 , the H-score was defined as score (G 1 G 2 )/max (score(G 1 G 1 ), score(G 2 G 2 )), where 'score' is the raw BLAST score. The average distance was used for the hierarchical clustering algorithm, requiring the minimum edge weight (H-score) to be larger than 10 and the minimum edge density (total number of edges/theoretical number of edges) to be larger than 1/3. 386 single-copy genes from the 10 species were aligned by MUSCLE 57 . We used MODELTEST 58 to select the best substitution model (GTR) and MRBAYES 59 to construct the phylogenetic tree. Then we estimated divergence time and neutral substitution rate per year (branch/divergence time) among species. The PAML mcmctree 60 used to estimate the species divergence time referred to two fossil calibrations, including the divergence time of D. melanogaster and Culicidae (238.5-295.4 million years ago) and the divergence time of D. melanogaster and Hymenoptera (238.5-307.2 million years ago) 61,62 . T. urticae (Arachnida) was used as an outgroup, and a bootstrap value was set as 1000. In addition, the evolutionary changes in the protein family size (expansion or contraction) were analysed using the CAFÉ program 63 , which assesses the protein family expansion or contraction based on the topology of the phylogenetic tree.
Linkage map. Two genetically contrasting strains of S. litura, one developed at the University of Delhi, India (called the India strain) and another available at the National Institute of Agrobiological Sciences, Japan (the Ishihara strain), were employed to generate a mapping population. F1 offspring were obtained by crossing an India male and an Ishihara female. An F1 male was crossed with an Ishihara female as back cross (BC1), and these BC1 offspring were used to develop a RAD library 64 . Genomic DNA was isolated from 116 BC1 individuals, Ishihara male, India female and F1 male, and RAD sequencing libraries were constructed following a standard protocol. Sequencing was carried out using an Illumina HiSeq2000 platform. RAD-seq reads were aligned to the reference genome sequence using Short Oligonucleotide Analysis Package 2 (SOAP2) 43 to analyse the genotypes of each individual at every genomic site. Polymorphic loci relative to the reference sequence were selected and then filtered. SNP markers were recorded if they were supported by at least 5 reads with quality value greater than 20, and ambiguous SNPs (SNP = N) were eliminated. Only SNP markers that were homozygous and polymorphic between parents, heterozygous in the F1 and followed a Mendelian segregation pattern were selected for linkage map construction. This resulted in the identification of a total of 87,120 RAD markers. Further filtering was done by selecting only SNP markers with a missing rate of < 0.09 that were separated by at least 2000 bp. After such stringent filtering, a total of 6088 SNP markers were obtained and subsequently used to develop a linkage map using JoinMap 4.1 65 . The limit of detection (LOD) score = Z = log(probability of sequence with linkage/probability of sequence with no linkage) for the occurrence of linkage was set to 4-20 (start-end). By applying the indicated parameters, we narrowed down the map to 31 linkage groups (Supplementary Fig. 2b).

Syntenic comparison.
We obtained peptides and genome sequences for B. mori 66 , Papilio xuthus 67 and H. melpomene 11 . If a gene had more than one transcript, only the first transcript in the annotation was used. To search for homology, protein-coding genes of S. litura were compared to those of B. mori, P. xuthus and H. melpomene using BLASTP 51 . For a protein sequence, the best five non-self hits in each target genome that met an E-value threshold of 0.00001 were reported. Whole-genome BLASTP results and the genome annotation file were used to compute collinear blocks for all possible pairs of chromosomes using MCScan software 68 . A region with at least 5 syntenic genes and no more than 15 gapped genes was called a syntenic block.
Annotation of the gustatory receptor (GR) gene family. A set of described Lepidoptera gustatory receptors (GRs) was used to search the S. litura genome by TBLASTN. Additionally, a combination approach of HMMER 69 and Genewise 46 was used to identify additional GR sequences. Scaffolds that were found to contain candidate GR genes were aligned to protein sequences to define intron/exon boundaries using Scipio 70 and Exonerate 71 . The GR classification and the integrity of the deduced proteins were verified using BLASTP against the non-redundant GenBank database. When genes were split in different scaffolds, the protein sequences were merged for further analyses.
Annotation and phylogenetic study of the cytochrome P450 (CYP) gene family. Identity between two CYP proteins can be as low as 25% but the conserved motifs distributed along the sequence allow clear identification of CYP sequences. Conserved CYP protein structure is featured by a four-helix bundle (D, E, I and L), helices J and K, two sets of β sheets and a coil called the 'meander' . The conserved motifs include WXXXR in the C helix, the conserved Thr of helix I, EXXR of helix K and the PERF motif followed by a haeme-binding region FXXGXXXCXG around the axial Cys ligand 72 . All the scaffolds containing candidate CYPs were manually annotated to identify intron/exon boundaries. Protein CYP sequences were compared by phylogenetic studies to the S. frugiperda CYPome 73 for name attribution.

Annotation of carboxylesterase (COE), glutathione-S-transferase (GST), aminopeptidase N (APN) and ATP-binding cassette (ABC) transporter gene families.
Sets of lepidopteran amino acid sequences for each gene family were collected from KAIKObase (http://sgp.dna.affrc.go.jp/KAIKObase/) and the NCBI Reference Sequence database. Each gene family was then searched in the S. litura genome assembly and predicted gene set by TBLASTN and BLASTP using each set of lepidopteran amino acid sequences. Identified genes were further examined by HMMER3 search (cutoff E-value = 0.001) using the Pfam database to confirm conserved domains in each gene family. In addition, the classification of each gene family was performed with BLASTP in the nonredundant GenBank database.
Toxin treatment of S. litura larvae for transcriptome analysis. Fifth-instar larvae of the inbred strain were each fed with 1 g of artificial diet supplemented with 1 mg g −1 xanthotoxin. Control larvae were fed an artificial diet without xanthotoxin. For the ricin and imidacloprid treatments, the artificial diet was supplemented with either ground Ricinus communis seeds at a concentration of 50 mg g −1 or imidacloprid at a concentration of 50 µ g g −1 , respectively. Ten individuals were used for each treatment and three independent replicates were performed. Whole larvae were used for RNA extraction at 48 h post toxin treatment. Fat body, midgut and malpighian tubule were dissected from the toxin-treated larvae for RNA preparation. Total RNA was extracted from the tissues using Trizol reagent according to the manufacturer's instructions (Invitrogen, USA) and contaminating DNA was digested with RNase-free DNase I (Takara, China). The integrity and quality of the mRNA samples were confirmed using an Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA).
GR transcriptome analysis. Larval antenna, thoracic legs, ephipharynx, maxilla and midgut were dissected from sixth-instar larvae, while antenna, legs, pheromone glands and proboscis were from moths. Due to very low GR expression levels, we used 100 larvae for RNA preparation. For expression profiling, we recorded all GR genes with expression levels higher than 0.1 FPKM in any tissue ( Fig. 1d; red).
Quantitative PCR with reverse transcription (RT-qPCR). Total RNA was subjected to reverse transcription using a PrimeScript™ RT Master Mix (Perfect Real Time) (TaKaRa) in 50 μ l reaction volumes (2500 ng total RNA) and then diluted 5-fold. 1 μ l cDNA was used per 10 μ l PCR reaction volume. PCR was carried out with the following program: 94 °C for 2 min followed by 30 cycles of 94 °C for 10 sec, 50 °C for 15 sec, and 72 °C for 30 sec with rTaq DNA polymerase (TaKaRa) using pairs of gene-specific primers (Supplementary Table 19). RT-qPCR of each gene was repeated at least three times in two independent samples. BmActin3 was used as a control for each set of RT-qPCR reactions and for gel loading.
siRNA injection for knockdown of SlituGST, SlituP450 and SlituCOE genes. 4 µ l of siRNA (100 pm µ l −1 ) were injected into the haemolymph of each fifthinstar larva, while injection of the same amount (4 µ l) of GFP siRNA was used for controls. After 24 h post injection, larvae were reared on an artificial diet supplemented with imidacloprid at 50 µ g g −1 until bioassay. siRNA sequences are listed in Supplementary Table 20.
To determine the effect of imidacloprid ingestion, larval condition was scored at 2, 6, 12, 18, 24, 36 and 48 post feeding. ' Affected' means that larvae rounded up and did not move after a couple of hours when touched, as if dead (suspended animation). However, several hours later, many affected larvae recovered from their suspended state, probably due to detoxification of ingested imidacloprid. The GST knockdown experiment used 3 replicates of 10 larvae. Post feeding replicates were scored independently for SlituGST-7 and -20; the remaining knockdowns (SlituP450-0740, -088, -092 and -098, and SlituCOE-057 and -058) were conducted as preliminary trials without replicates using 30 larvae per gene.
Overexpression and purification of recombinant SlituGST07 and SlituGST20 proteins. Competent Escherichia coli Rosetta (DE3) pLysS cells (Novagen; EMD Millipore) were transformed with expression vectors harbouring SlituGST07 cDNA (pET32.M3) or SlituGST20 cDNA (pCold_SUMO) and grown at 37 °C on Luria-Bertani (LB) medium containing 100 µ g ml −1 ampicillin. After cells transformed with SlituGST07 cDNA reached a density of 0.7 OD 600 , isopropyl 1-thio-ß-D-galactoside (IPTG) was added to a final concentration of 1 mM to induce the production of recombinant protein and cultured overnight at 30 °C. Cells were then harvested by centrifugation, homogenized in 20 mM Tris-HCl buffer (pH 8.0) containing 0.5 M NaCl, 4 mg ml −1 of lysozyme, and disrupted by sonication. Cells transformed with SlituGST20 cDNA were grown to a density of 0.5 OD 600 , and stored on ice for 30 min before addition of IPTG to a final concentration of 1 mM, followed by a further incubation overnight at 18 °C before harvesting and disruption. Unless otherwise noted, all of the operations described below were conducted at 4 °C. The supernatant was clarified by centrifugation at 10,000g for 15 min and subjected to Ni 2+ -affinity chromatography equilibrated with 20 mM Tris-HCl buffer (pH 8.0) containing 0.2 M NaCl. After washing with the same buffer, the samples were eluted with a linear gradient of 0-0.5 M imidazole. The enzyme-containing fractions, assayed as described below, were pooled, concentrated using a centrifugal filter (Millipore, Billerica, MA, USA), and applied to a Superdex 200 column (GE Healthcare Bio-Sciences, Buckinghamshire, UK) equilibrated with the same buffer plus 0.2 M NaCl. Each fraction was assayed and analysed by SDS-PAGE using a 15% polyacrylamide slab gel containing 0.1% SDS, according to the method of Laemmli 82 . Protein bands were visualized by Coomassie Brilliant Blue R250 staining.
Measurement of GST enzyme activity. GST activity was measured spectrophotometrically using 1-chloro-2,4-dinitrobenzene (CDNB) and glutathione (GSH) as standard substrates 83 . Briefly, 1 µ l of a test solution was added to 0.1 ml of a citrate-phosphate-borate buffer (pH 7.0) containing 5 mM CDNB and 5 mM GSH. Increase in absorbance at 340 nm min −1 was monitored at 30 °C and expressed as moles of CDNB conjugated with GSH per min per mg of protein using the molar extinction coefficient of the resultant 2,4-dinitrophenylglutathione: ε 340 = 9600 M −1 cm −1 .
Sampling and sequencing for population genetics study. S. litura was sampled from three locations in India (Delhi, Hyderabad and Matsyapuri), 11 locations in China, including Fujian, Guanxi, 2 locations in Guangzhou (Guangzhou and South China Normal University), Hainan, Hubei, Shanxi, Zhejiang, 3 locations in Hunan (Hunan1, Hunan2 and Hunan3), and 2 locations in Japan (Tsukuba and Okinawa). Four individuals were sampled from each location, except for Hunan1 (3 individuals). A total of 63 individuals were used in this study.
Mapping and SNP calling. First, mapping of reads of each individual to the reference genome was conducted. The proper mapping rate was about 70% for 56 individuals except for 7 individuals (Supplementary Table 21). Since the proper mapping rates for four individuals from the Shanxi population and three individuals from Fujian were extremely low, they were excluded from the population genomics analysis. SNP calling was conducted by comparing 56 genomes with the reference genome. Finally, a multiple VCF file was generated including 56 individuals. Sites with missing values or quality values below 20 were screened by VCFtools software 84 . In total, 46,595,432 SNPs were identified and included in this analysis.
Genetic diversity, population structure and balancing selection. The nucleotide diversity (π ) of 14 local populations and pairwise F ST values were calculated using VCFtools software with window size 5000 bp, step 2500 bp. The genomic nucleotide diversity was obtained by averaging over the values of windows. The weighted F ST was calculated using the Weir and Cockerham estimator 85 . Based on the pairwise F ST , hierarchical cluster analysis was conducted using R software. Because of the small sample size in each sampling location, interpretation of population genomic analysis needs careful evaluation of the precision. The precision of π and F ST values were evaluated by parametric bootstrap with coalescent simulation 86 . Haplotypes of windows were generated using the population-specific π values multiplied by 5000 and 4 Nms calculated as 1/F ST − 1. Two haplotypes were generated for each window. A thousand sets of haplotypes were generated independently and concatenated to make a bootstrap sample. For each of 100 bootstrap samples, the π values and pairwise F ST were calculated to estimate the standard errors. The adopted number of sets was less than the number of the scaffolds. Because the genome size of S. litura was about 4 × 10 8 bp, we mimicked the subsampling of windows that were separated by bp on average so that we could estimate approximate independence between the sub-sampled windows.
To confirm the observed population structure, we conducted a model-based structure analysis 34,87 . Based on the allele frequency divergence among the ancestral populations (P) and the membership coefficients that assign the populations to the ancestral populations (Q), we calculated the predicted allele frequency divergence between the population (QPQ t ). We also analysed individual-level membership coefficients and the allele frequency divergence.
We further estimated the global pattern of migration by analysing the joint allele frequency spectrums in terms of the population histories and the migration patterns by ∂a∂i (diffusion approximation for demographic inference) 39 . To avoid the complex effect of selection, we analysed SNPs in introns. Out of 20 million intronic SNPs, we randomly sampled 2 million SNPs. Based on the multi-dimensional scaling of F ST and the assignment of the individual genomes by structure, we constructed six population groups: the Indian local population (with the sample from Delhi), Indian migratory population (with the samples from Hyderabad and Matsyapuri), Chinese isolated population (with the samples from Guangzhou2 and Hunan1), Chinese local population (with the samples from Hunan3, Guangxi, Hainan, three individuals of Hunan2 and Hainan), Chinese migratory population (with the samples from Fujian, and one individual each of Hunan2, Hunan3, Hunan4, Zhejiang and Guangzhou1), and Japanese migrating population (with the samples from Okinawa and Tsukuba). To each pair of population groups we applied the IM (isolation with migration) model 40 with population expansion/shrinkage. The estimated migration rates represent the number of migrating chromosomes per generation. To obtain the population sizes and the time of population splitting from the estimated relative values, we followed a previous study 88 that assumes the generation time of 0.3 year and uses the standard mutation rate of 8.4 × 10 −9 (per site per generation) from Drosophila 89 . The standard errors were obtained by parametric bootstrap of coalescent simulation 86 . Assuming the estimated scenarios of population history, we generated 100 bootstrap samples of 2 million SNPs. To reflect the correlation structure between SNP loci, we assumed that they were evenly distributed on 28 chromosomes. SNPs on different chromosomes are independent. Noting that the mean distance between the neighbouring SNP loci (in bp) was we set the recombination rate to be ρ = 2.3 × 10 −5 . We also tested two alternative values, ρ = 0 and ρ = 0.01, and obtained similar standard errors.