Edinburgh Research Explorer Differential gene expression according to race and host plant in the pea aphid

Host-race formation in phytophagous insects is thought to provide the opportunity for local adaptation and subsequent ecological speciation. Studying gene expression differences amongst host races may help to identify phenotypes under (or resulting from) divergent selection and their genetic, molecular and physiological bases. The pea aphid ( Acyrthosiphon pisum ) comprises host races specializing on numerous plants in the Fabaceae and provides a unique system for examining the early stages of diversiﬁ-cation along a gradient of genetic and associated adaptive divergence. In this study, we examine transcriptome-wide gene expression both in response to environment and across pea aphid races selected to cover the range of genetic divergence reported in this species complex. We identify changes in expression in response to host plant, indicating the importance of gene expression in aphid – plant interactions. Races can be distinguished on the basis of gene expression, and higher numbers of differentially expressed genes are apparent between more divergent races; these expression differences between host races may result from genetic drift and reproductive isolation and possibly divergent selection. Expression differences related to plant adaptation include a subset of chemosensory and salivary genes. Genes showing expression changes in response to host plant do not make up a large portion of between-race expression differences, providing conﬁrmation of previous studies’ ﬁndings that genes involved in expression differences between diverging populations or species are not necessarily those showing initial plasticity in the face of environmental change.


Introduction
Understanding how natural selection acts on genetic variation to facilitate adaptation to different environments is a central question in evolutionary biology. Host-race formation in insects provides many useful examples in which to study local adaptation and has long been a focus of speciation research (Dr es & Mallet 2002;Bush & Butlin 2004;Forister et al. 2011). The huge species richness of many insect groups is associated with specialization by individual species on very limited ranges of host taxa (Farrell 1998) implying that a combination of cospeciation with hosts and speciation via hostswitching (Weiblen & Bush 2002) are major drivers of diversity. Host races, genetically distinct populations adapted to different host species but still exchanging genes, provide excellent models for understanding the selection pressures and genetic changes involved in adaptation to new hosts and the consequent evolution of reproductive isolation.
Measures of gene expression can provide an important bridge between genotype and phenotype (Huestis & Marshall 2009), and expression profiles provide many more phenotypes than can easily be documented through morphological or behavioural analysis (Pavey et al. 2010). In recent years, the study of gene expression has been greatly facilitated by high-throughput sequencing-based methods such as RNA-Seq (Mortazavi et al. 2008), and the analysis of gene expression now has the potential to contribute to the understanding of the genetics of both local adaptation and speciation.
Comparative gene expression studies enable the identification of biological functions involved in the adaptation of organisms to their surrounding environments. Gene expression can also provide a novel source of information on the extent and nature of divergence between species (Khaitovich et al. 2004) or between populations that experience partial reproductive isolation (Wolf et al. 2010). Gene expression differences may result from drift, but unusually strong differentiation in expression could indicate divergence under selection, analogous to genome scans based on allele frequencies (Roberge et al. 2007), and there may be more opportunity to associate expression outliers with adaptive traits than for the anonymous markers used in many genome scans. In a few cases, loci of major effect that operate via control of expression have been identified (Chan et al. 2010). Where expression patterns are environmentdependent, divergence may be especially informative about ecological speciation (Pavey et al. 2010).
The clearly defined yet often spatially intermingled habitats represented by host plants provide examples of divergent selection that illuminate the process of local adaptation particularly clearly (Dr es & Mallet 2002). Genomic analyses in some systems have begun to provide insights into the genetic architecture of divergence. For example, in the apple maggot fly, Rhagoletis pomonella, targets of selection during adaptation to the novel apple host appear to be genomically widespread (Michel et al. 2010), perhaps because divergence in multiple traits is needed. Many loci also appear to be under selection in the walking stick, Timema cristinae, some associated with habitat components other than host plant Soria-Carrasco et al. 2014). A role for gene expression changes during local adaptation has been highlighted by Ragland et al. (2015), who found both a plastic response and genetically based adaptations enhancing host-associated fitness differences.
The pea aphid, Acyrthosiphon pisum, was the first aphid species to have its genome sequenced (The International Aphid Genomics Consortium 2010). In Europe, at least 15 genetically distinct populations (races) are reported, each associated with one or a few species of the Fabaceae. Several of these races are found in sympatry from Europe to Japan, and some have also been introduced in South and North America (Via 1991;Peccoud et al. 2008Peccoud et al. , 2009a. These 15 races form a continuum of levels of isolation from those producing around 10% F1 hybrids up to highly genetically differentiated host races (F ST > 0.8 in sympatry) that probably experience no current gene flow (Peccoud et al. 2009a(Peccoud et al. , 2015. There is evidence that these races have diverged recently, possibly at around the time of the Neolithic expansion of farming (Peccoud et al. 2009b). Despite overlapping host plant ranges (Peccoud et al. 2009a), this results in assortative mating since host races feed, multiply and reproduce sexually on their specific plants, and offers opportunities for the evolution of reproductive isolation.
Aphid recognition of host plant species and establishment of phloem feeding have several stages, described in Powell et al. (2006) and Simon et al. (2015), with roles for olfaction, gustation and the interaction between aphid saliva and the plant. Functional analyses and genome scan studies have highlighted the potential involvement of chemosensory and salivary genes in the plant specialization of pea aphid races. Genome scans in European races using microsatellites (Jaqui ery et al. 2012) found four outliers close to chemosensory receptor and salivary protein encoding genes. Smadja et al. (2012) analysed the whole chemosensory gene repertoire through targeted resequencing and found a small number of odorant and gustatory receptor genes as outlier loci. In insects, volatile and nonvolatile compounds are recognized by chemoreceptors, including odorant receptors (OR), ionotropic receptors (IR) and gustatory receptors (GR; Hallem et al. 2006;Croset et al. 2010), through binding proteins (odorant binding proteins and chemosensory proteins) that are involved in the solubilization and transport of odorants (Leal 2005). Other classes of protein, such as sensory neuron membrane proteins (SNMPs), are also considered important in insect chemoreception (Jin et al. 2008;Vogt et al. 2009). These genes belong to very large multigene families in most insect genomes (S anchez-Gracia et al. 2009). Several lines of evidence suggest a key role of chemosensory genes in host selection [e.g. Anopheles gambiae: Schymura et al. (2010)] and in particular in host plant specialization in phytophagous insects (Visser 1986;Whiteman & Pierce 2008). Their mode of evolution under a birth-and-death model and evidence for positive selection in some branches of these multigene families suggest rapid and adaptive evolution in specialized lineages of insects including aphids (Matsuo 2008;Smadja et al. 2009;Schymura et al. 2010;Zhou et al. 2010;Briscoe et al. 2013). In the pea aphid, populationbased studies also revealed the potential role of chemosensory genes in host plant specialization and the ability of these genes to evolve quite rapidly at smaller evolutionary scales, by means of divergent selection (Smadja et al. 2012) and copy number variation amongst specialized races (Duvaux et al. 2015). However, the rapid evolution of chemotactic behaviours involved in host plant specialization and/or mate choice in the pea aphid could also be driven by regulatory changes. Indeed, some studies identify a role for regulatory changes at OBPs and ORs in host plant choice in Drosophila sechellia specialized on Morinda fruit, showing some downregulated or upregulated genes in this specialized species (Kopp et al. 2008;Dworkin & Jones 2009), and that a 4-bp insertion in the regulatory region of Obp57e may be involved in host plant specialization (Matsuo et al. 2007). These two classes of genes, chemosensory genes and genes for salivary proteins, are important candidates because of their potential roles in plant-aphid interactions . While several gene families have potential for influencing host plant recognition and speciation in the pea aphid system, here we focused on these two functional categories.
While puncturing plant cells with their stylets, aphids are thought to sample plant cell contents and also secrete saliva containing various proteins (Miles 1999;Tjallingii 2006;Carolan et al. 2009). As salivary proteins come into direct contact with plant cells, they have been hypothesized to function like virulence proteins of microbial pathogens (effectors), suppressing host plant defence mechanisms to facilitate aphid feeding (Kaloshian & Walling 2005;Dogimont et al. 2010;Hogenhout & Bos 2011;Elzinga et al. 2014). It is also hypothesized that some aphid saliva proteins might elicit plant defence reactions in specific plant species; indeed, several studies have shown that aphid saliva or saliva proteins promote or reduce aphid fitness Mutti et al. 2008;De Vos & Jander 2009;Bos et al. 2010;Atamian et al. 2012). Furthermore, some salivary proteins have been shown to promote aphid colonization in a plant-specific manner, and the genes encoding these aphid saliva proteins are under positive selection (Pitino & Hogenhout 2013), suggesting that they may play a role in adaptation to host plants.
To be able to detect differences in gene expression at lowly expressed genes such as chemoreceptor genes (Shiao et al. 2013) while still gaining insight into other possible genes implicated in host acceptance in the pea aphid, we have deeply sequenced the entire transcriptome of pea aphid heads using RNA-seq (Wang et al. 2009). Using multiple aphid clones from each of six host races along the continuum of divergence, we have examined gene expression both on their collection host and on a 'universal' host, Vicia faba used as a common garden. This large-scale sequencing approach has allowed us to test for expression variation between races when all aphids are reared on the same host plant species (universal host), the changes in gene expression when a clone is reared on its collection host compared to the universal host, and the interaction between these two, that is the differences between races in the way they respond to the change of host plant. As 'home' environment is different in each race, we aim to identify expression changes underlying the ability of each race to cope with their unique host environment, and as the difference in environment between Vicia faba and 'home' plant is not constant across races, we would expect to identify a strong interaction effect. In addition to analysing overall patterns of expression, we have specifically examined expression of salivary and chemosensory genes.

Aphid collection and rearing
Pea aphids reproduce asexually from spring to autumn in temperate zones, so it is possible to obtain natural clones from individual aphids. For each aphid race, several genotypes (clones) were derived from single asexual aphids collected in the field in Southern England (May-July 2003 or May and August 2010). The aphids were collected from Medicago sativa L., Lotus pedunculatus Cav., Lotus corniculatus L., Pisum sativum L., Ononis spinosa L. or O. repens L. and Lathyrus pratensis L. (Table S1, Supporting information). Races associated with these hosts are situated along a continuum from least to most genetic divergence as described by Peccoud et al. (2009a). Aphids from the same plant species were collected at least 30 m apart to avoid collecting the same genotype twice. Clones collected from a particular host plant do not necessarily belong to the race associated with that host because there is some movement of aphids between hosts and the possibility of clones of hybrid origin. We used assignments from Duvaux et al. (2015) based on SNP and microsatellite data, and retained aphid clones whose genotypes were correctly assigned to their respective race-associated genetic cluster (Table S1, Supporting information).
Aphid clonal cultures were established in the laboratory on Vicia faba L. var. the Sutton (broad bean). They were reared at 15°C, 60% r.h. and a 16-h light: 8-h dark cycle. Most pea aphids perform well on V. faba: aphids' acceptance, survival and fecundity are uniformly higher on their home plant and on V. faba than on nonhome plants (Ferrari et al. 2008), and as such, it is considered as 'universal host' (Sandstr€ om & Pettersson 1994;Ferrari et al., 2006;Ferrari et al., 2008).
For expression experiments, each aphid clone was reared on the plant species that it was collected from ('home', for Ononis we used O. spinosa) or on V. faba ('Vicia') ( Fig. 1). The plants had been grown in a greenhouse for 6 weeks for most plant species, except for the P. sativum plants, which were three weeks old, and the V. faba plants, which were two weeks old. Wingless adult females were taken from the culture on V. faba and transferred to Petri dishes that contained a leaf of the test plant species in 2% agar. After 24 h, up to 15 offspring were transferred to a potted test plant and kept at 20°C, 60% r.h. and a 16-h light: 8-h dark cycle. At 10-11 days old, aphids were collected from the plant, immediately flash-frozen and stored at À80°C until dissection. This procedure was repeated two to three times per clone/test plant combination, and these repeats were separated by several weeks to reduce the probability that differences between aphid clones were due to environmental variation. All samples per clone/ test plant species combination were pooled in the RNA extractions.

Dissection and RNA extraction
We analysed gene expression in aphid heads as we were chiefly interested in salivary and chemosensory genes. Dissections were conducted on ice to prevent thawing. Heads were dissected from all frozen samples by cutting behind the first pair of legs and then removing the legs. This ensured that the salivary glands were included in the sample. Typically, the RNA of 20 heads from wingless adult aphids per clone/test plant combination was extracted and pooled for RNA-seq analysis. RNA extractions were performed using the Macherey-Nagel NucleoSpin RNA II kit (Macherey-Nagel, D€ uren, Germany), following the manufacturer's instructions. The quality of the RNA samples was checked using an Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA).

Sequencing
Barcoded RNAseq libraries were prepared using Illumina TruSeq RNA Sample Prep Kit version 2 (nonstrand-specific) using 1 lg of total RNA input and 10 PCR cycles, as per the manufacturer's recommendations. Final libraries were quantified by qPCR and combined into 6-plex pools prior to sequencing. Each pool was run across one HiSeq 2000 lane, using 75-bp paired-end reads (version 3 chemistry). Library preparation and sequencing were carried out at Edinburgh Genomics (University of Edinburgh, UK). Libraries that did not pass Edinburgh Genomics' quality threshold were resequenced (in two additional sequencing runs, Table S1, Supporting information).

Read mapping
Reads were quality-trimmed using the programs SICKLE and SCYTHE (https://github.com/najoshi/sickle, https:// github.com/vsbuffalo/scythe), using a quality cut-off of 20 and retaining sequences longer than 50 bp. Sequences containing Ns were discarded. Reads were mapped to the pea aphid reference genome (M. sativaassociated strain; IAGC, 2010). The reads from each library were mapped separately using version 2.08 of TopHat2 (Kim et al. 2013), a mapper that handles spliced-read alignments, that is reads mapping over exon/intron junctions, with default parameters except for the following options (-g 1, -no_mixed, -no-discordant). The number of reads mapped to each gene from the Official Gene Set annotation from AphidBase (Legeai et al. 2010) was calculated by summing each read that overlaps at least one exon of any particular gene (script available on Dryad upon publication). Multimapped reads (reads mapping to multiple locations in the genome) can sometimes be an issue when measuring expression and can particularly affect genes belonging to multigene families such as our candidate genes (Robert & Watson 2015). However, in our case, changing TopHat2 parameters to allow reads to map to multiple locations did not affect downstream results, so we therefore kept our original TopHat2 settings for subsequent analyses. There was no strong bias in mapping for nonreference races, in fact counts from mapping marginally increased in samples more distantly related to the reference genome (Spearman's rho = 0.056, P < 0.05).

Sample selection, filtering steps and expression patterns amongst samples
Libraries were retained for differential expression analyses when sequencing results were available for clones reared in both 'home' and 'Vicia' conditions and when at least four clones (i.e. genotypes) per host race were available. This left 52 sequenced libraries allowing for six different host races, two rearing conditions, and either four or five clones within each host race (biological replicates) to be analysed ( Fig. 1, Table S1, Supporting information). Predicted genes from the Official Gene Set corresponding to rRNA sequences (589) were excluded from further analysis, leaving 36 401 genes.

Normalization of expression data
Differential expression analyses were performed using the DESEQ2 package (version 1.2.10; Love et al. 2014) implemented in the R statistical software (version 3.0.2; R Development Core Team 2013). There are numerous analytical methods available for detecting differential expression in RNA-seq data (e.g. Smyth 2005; Hardcastle & Kelly 2010; Robinson et al. 2010;Tarazona et al. 2011), and there is little consensus on which is most robust (high true positive rate, low false positive rate and low false negative rate). Seyednasrollah et al. (2015), Schurch et al. (2016) and Soneson & Delorenzi (2013) all conclude that the best method is highly dependent on experimental design, and DESeq2 (Love et al. 2014) and other methods based on the negative binomial are often found to be an appropriate choice with <5 replicates per condition. Rocke et al. (2015) found an inflated false positive rate using negative binomial methods, especially for genes with high expression and/or dispersion, but as we were most interested in candidate genes with typically very low expression levels we considered DESeq2 an appropriate choice of analytical method here.
To ensure that no outlier samples (sequencing libraries with experimental irregularities rendering them unhelpful in detecting differentially expressed genes) were included in the final analyses, sample quality was assessed by clustering based on similarity of expression. Sample-to-sample distances were visualized using a heatmap with hierarchical clustering (Fig. 2). A principal components analysis (PCA) was used to confirm the absence of sequencing batch effects (Fig. S1, Supporting information), and guided PCA using the GPCA package in R (GPCA version 1.0) found no evidence for a statistically significant effect of sequencing batch (delta = 0.98, P = 0.385). Count data were normalized using DESeq2 default settings (Love et al. 2014). Shared information across genes was used to calculate dispersion estimates (within-group variability) as described in Love et al. (2014). Normalization size factors are recorded in Table S1 (Supporting information), and count data before and after normalization are shown in Fig. S2 (Supporting information). Details of DESeq2 methods and settings can be found in supplementary file 1.

Differential expression analyses
Differentially expressed genes were called by implementing generalized linear models in DESeq2. Independent filtering was employed using the genefilter package (Gentleman et al. 2011) in DESeq2, which removes very low expression genes on the basis of mean normalized counts, optimizing the number of genes with an adjusted P-value < 0.1.
To identify genes differentially expressed between different races of A. pisum, we used only aphids grown on V. faba as a common garden. We compared gene expression in samples from the M. sativa-associated race to expression in each other race in turn (Table 1); contrasts were evaluated using Wald tests, and P values were adjusted for multiple testing using the Benjamini-Hochberg procedure (FDR P < 0.05) (Benjamini & Hochberg 1995). The M. sativa-associated race makes a suitable baseline for contrasts as the reference genome used here for mapping and counting sequencing reads was of this race (IAGC 2010).
To identify genes differentially expressed in response to plant type, gene expression was compared between aphids grown on their home plants and those grown on V. faba, for each race in turn (Table 1). Plant response was not examined across all races at once because, although the V. faba condition was consistent across races, home plant is by definition different in each race and it may not be appropriate to consider them as equivalent. Within each race, a likelihood ratio test was used to compare the full model (~clone + plant) to a reduced model with plant removed (~clone), and P values were adjusted for multiple testing using the Benjamini-Hochberg procedure (FDR P < 0.05).
Genes that respond differently to plant conditions in different races represent a particularly interesting set since they show a lineage-specific response to environment (comparing Vicia faba with their home plant). To identify genes differentially expressed between plant conditions in a way that was dependent on race, all 52 samples were analysed, and a likelihood ratio test was used to compare a model containing the interaction term plant: race (~clone + plant + race + plant:race) to a reduced model (~clone + plant + race) without the interaction term, and P values were adjusted for multiple testing using the Benjamini-Hochberg procedure (FDR P < 0.05). Genes differentially expressed between plant conditions in a way that differed in each individual race were identified by fitting the model clone + plant + race + plant:race and then using Wald tests to determine whether the log2-fold change for 'home plant' over 'Vicia' was different between races; P values were adjusted for multiple testing using the Benjamini-Hochberg procedure (FDR P < 0.05).

Functional annotation and gene ontology (GO) category enrichment
Gene ontology (GO) annotations from the A. pisum genome were retrieved from AphidBase (http:// Fig. 2 Heatmap of sample-to-sample Euclidean distances calculated from regularized log-transformed count data (top 2500 genes ranked by variance). The rLog-transformation accounts for differences in sequencing depth. Dendrograms show hierarchical clustering of samples. Individuals reared on V. faba were labelled in black; individuals reared on home plant were labelled in colour.
www.aphidbase.com) and matched to the expressed genes. In total, 11 412 of the 36 401 expressed genes received GO annotations (Table S5, Supporting information). Blast2GO (Conesa et al. 2005) was used to implement Fisher's exact tests for enrichment of GO categories in each set of differentially expressed genes, using default settings. REVIGO (Supek et al. 2011) was used to summarize enriched GO categories for plant-effect and race-effect differentially expressed genes, using default parameters.

Analyses of salivary and chemosensory gene families
A list of 307 candidate salivary genes (Table S2, Supporting information) was compiled by taking all salivary genes identified in seven publications (Harmel et al. 2008;Carolan et al. 2009Carolan et al. , 2011Bos et al. 2010;Atamian et al. 2012;Jaqui ery et al. 2012;Elzinga et al. 2014) and identifying their corresponding sequences (or their orthologues' sequences) in the 36 401 genes present in our RNAseq libraries. Carolan et al. (2011) used ACYPI identification numbers (see www.aphidbase.com) from the v1 annotation of the pea aphid genome; we identified corresponding ACYPI numbers in the v2 annotation of the genome using BLASTP (Altschul et al. 1990), retaining the best hit with an e-value cut-off of 1eÀ20. It was not possible to assign v2 ACYPI numbers to 11 of the genes identified in the seven publications. The TMH-MM algorithm version 2.0c (Krogh et al. 2001) was used to predict transmembrane domains in the salivary gene catalog created by Carolan et al. (2011), and 65 proteins with predicted transmembrane domains in addition to their signal peptides were not included in the final gene set as they are most likely not secreted in saliva.
A list of 113 candidate chemosensory genes (Table S2, Supporting information) was produced by identifying genes in our expression data set corresponding to odorant receptor (OR) and gustatory receptor (GR) genes listed in Smadja et al. (2009), odorant binding protein (OBP) and chemosensory protein (CSP) genes listed in Zhou et al. (2010) and ionotropic glutamate receptor (IR) and sensory neuron membrane protein (SNMP) genes listed in Duvaux et al. (2015).
The twelve sets of differentially expressed genes identified using DESeq2 [between each race and the M. sativa reference race (5), between hosts within race (6) and those with significant plant:race interaction (1)] were used to perform the following tests for the relationship between the salivary and chemosensory candidate gene sets and differential expression.
Categorical test: was there a significant over-representation of differentially expressed genes in candidate gene categories?. As it is easier to identify highly expressed genes as significantly differentially expressed, we expected a bias towards detection of differentially expressed genes in categories with an over-representation of highly expressed genes (Oshlack & Wakefield 2009;Young et al. 2010). Although mean normalized expression of chemosensory genes across races did not differ significantly from that of nonchemosensory genes (Mann-Whitney U-test, W = 2 098 210, P value = 0.667), it did differ significantly between salivary and non salivary genes (Mann-Whitney U-test, W = 1 288 540, P value < 2.2eÀ16), so it was necessary to take expression bias into account.
The bioconductor package GOSEQ (Young et al. 2010) was used to incorporate an expression bias correction. GOSEQ was implemented in R (version 3.1.0) (R Development Core Team 2005) using a standard protocol; chemosensory and salivary categories were defined as described above, the means of log 2 -normalized sequence counts were used for bias data, differentially expressed genes were those with an adjusted P value ≤ 0.05, and the Wallenius distribution was used to approximate the null expectation of identifying differentially expressed genes in candidate categories.
Continuous test: was the magnitude of log-fold change in expression significantly greater in candidate genes than in noncandidate genes?. Genes with low counts have exaggerated fold changes, which causes bias in continuous tests comparing magnitude of expression differences between categories of genes (Oshlack & Wakefield 2009). If the candidate gene set has a bias to low expression transcripts, we would expect a higher mean fold change in candidate genes by virtue of this, which would lead to an overestimation of fold change bias in candidates unless accounted for.
To account for expression differences between categories, count data were transformed using the regularized log-transformation in DESeq2 (Love et al. 2014). The regularized log-transformation log-transforms the average across samples of each gene's normalized count and then shrinks the log-normalized counts towards the log averages, applying greater shrinkage to genes with weaker expression (Love et al. 2014). This compensates for the relatively higher variability expected in low expression genes.
Mann-Whitney U-tests were used to assess whether candidate genes had significantly higher log-fold changes in expression in comparison with noncandidate genes. As magnitude of expression differences showed a tendency to differ in only up-or downregulated genes rather than both (i.e. magnitude of log2-fold change showed a skewed distribution), genes were split into upand downregulated for each test (in the plant-effect comparison in relation to expression in aphids grown on V. faba and in comparisons of races in relation to expression in the M. sativa race).

Overall patterns of expression
Of the total 36 401 genes with mapped reads, a mean of 25 196 genes were expressed per RNA sample (69.2%), while 15 676 expressed genes were common to all 52 samples (43.1%) and 1282 genes were expressed in samples from one race only. A mean of 85.5 genes from the chemosensory gene set (AE0.87 SE) were expressed in each RNA sample, and 98 chemosensory genes were expressed in every race (86.7% of chemosensory genes). A mean of 298.0 genes from the salivary gene set (AE0.23 SE) were expressed in each RNA sample, and 299 salivary genes were expressed in every race (97.7% of salivary genes). No salivary or chemosensory genes were expressed only in a single race.
Aphid samples showed more similarities on the basis of race than they did in terms of the plant on which they were grown (Fig. 2), and home and 'Vicia' conditions of the same clone mostly clustered together, illustrating this strong clonal effect. However, some expression differences between individuals reared on different plants were also evident (Fig. S3, Supporting information).

Race effect
In the subset of aphids reared on V. faba, we compared expression in each race in turn to that in the M. sativaassociated race, aiming to reveal genes related to differences between host-associated races independent of differences related directly to the plant they were reared on. Relative to the M. sativa-associated race, between 1 406 and 4 322 genes per race were differentially expressed (adjusted P value ≤ 0.05; Fig. 3a). The number of differentially expressed genes increased with increasing genetic distance between races (Peccoud et al. 2009a). If sequence divergence interfered with read mapping, we might expect a bias towards apparently underexpressed genes in the more distant comparisons. However, no such trend was observed (Fig. 3a).

Plant effect
Genes differentially expressed between aphids grown on their home plant and on V. faba were identified in each race in turn (Table 2). These genes relate to the plastic response of aphids to the plant that they are reared on, a response that may differ genetically between races. The number of genes differentially expressed between aphids reared on the home plant and those reared on V. faba ranged from 164 to 554 (adjusted P value ≤ 0.05), with no consistent tendency towards up or downregulation on V. faba compared to the home plant.
Interaction effect-genes for which the response to host plant varies between races The expression of eight genes was better described by the inclusion of a plant:race interaction term in the model~clone + plant + race + plant:race in comparison with the reduced model~clone + plant + race (adjusted P value ≤ 0.05). These eight genes show significant variation in responses to plant amongst the six aphid races.
Looking for genes that showed a race-specific response to plant in individual races revealed more genes (Fig. 3b); between 3 and 142 genes per race were identified whose log2-fold expression change for aphids reared on their home plant in comparison with aphids reared on V. faba differed significantly from the average plant effect across all races. As observed when comparing race-effect genes, aphids from the cluster of more closely related races (i.e. M. sativa, L. pedunculatus and L. corniculatus) had fewer race-specific plantresponse genes, while races with increasing genetic distance from this cluster (in order of increasing distance: P. sativum, O. spinosa and L. pratensis) had an increasing number of race-specific plant-response genes.
Details of all differentially expressed genes (plant, race and interaction) can be found in Table S3 (Supporting information).

Enrichment of functional gene categories
GO terms associated with differentially expressed genes are shown in supplementary Fig. S4 (Supporting information). Fisher's exact test was used to test for enrichment of GO categories in each of the eleven race-and plant-associated lists of differentially expressed genes (FDR P < 0.05). GO term enrichment analysis identified 41 over-represented GO terms amongst the differentially expressed gene sets. Enriched GO terms for plant-effect and race-effect differentially expressed genes are displayed according to 'biological process' and 'molecular function' in supplementary Figs S5 and S6 (Supporting information), respectively. ACYPI20394, the only one of the eight genes showing significant interaction between plant and race to have a BLASTP hit in the GENBANK nr database, is similar to an A. pisum peroxidasin homolog (XP_003243661.1, BLASTP e-value = 4eÀ97).

Differential expression of candidate genes
The majority of salivary and chemosensory genes differentially expressed in the pairwise comparison of Fig. 3 (a) Number of differentially expressed genes over-or underexpressed in each race relative to the M. sativa-associated race (p adj <=0.05). Races are ordered according to genetic distance from the M. sativa-associated race. (b) Number of genes in each race whose expression response to host plant differed significantly from mean expression response across all races. each race with the M. sativa-associated race were only identified in a single pairwise race comparison (Table S4, Supporting information). Except for the P. sativum associated race, which had more differentially expressed salivary genes (40) than either the L. pratensis or the O. spinosa aphid races, the number of differentially expressed candidate genes relative to the M. sativa-associated race increased with increasing divergence between the aphid races. The majority of salivary and chemosensory genes differentially expressed in the pairwise comparison between 'home' and 'Vicia' were differentially expressed in a single race (Table S4, Supporting information), and no salivary or chemosensory genes were differentially expressed between 'home' and 'Vicia' conditions in all six races.

Candidate gene enrichment
Are candidate genes significantly over-represented amongst differentially expressed genes?. Genes differentially expressed between different races were significantly enriched for the set of 307 salivary candidate genes (Table 3A) only in the P. sativum associated race (40/ 2046, adjusted P = 0.015). Genes differentially expressed between aphids reared on their home plant in comparison with aphids reared on V. faba showed no significant enrichment for salivary genes (Table 3A). Neither genes differentially expressed between different races, nor those differentially expressed between aphids reared in 'home' and 'Vicia' conditions, were significantly enriched for the set of 113 chemosensory candidate genes (Table 3A and B). None of the eight genes identified as differing in expression between home and V. faba in a race-dependent manner was annotated as a salivary or a chemosensory gene.
Is the magnitude of change in expression significantly greater in candidate genes than in noncandidate genes?. The magnitude of expression changes between aphids associated with M. sativa, and all five other races were significantly higher in salivary genes than in nonsalivary genes (Table 4A), for genes both over-and underexpressed in these races in comparison with the M. sativaassociated race. The magnitude of expression differences between aphids reared on their home plant in comparison with aphids reared on V. faba was also significantly higher in salivary candidates than in nonsalivary genes in all six races (Table 4A).
Magnitude of expression change in the chemosensory candidates was not significantly higher than nonchemosensory genes when comparing expression between races (Table 4B). The magnitude of expression differences between aphids reared on their home plant in comparison with aphids reared on V. faba was significantly higher in chemosensory candidates than in nonchemosensory genes in three races: L. corniculatus, M. sativa and P. sativum (Table 4B).
There was only a very weak correlation between log2-fold change and within-group variance in expression (mean Spearman's rho = 0.049, P < 0.0005), so any difference observed in magnitude of log2-fold change should be independent of the general variability of those genes.

Discussion
Gene expression patterns provide new information regarding the divergence between pea aphid races; by examining these patterns both across pea aphid races and in response to environment, we have been able to examine gene expression as a phenotype, allowing the identification of genes with potential roles in plant specialization.
To understand how expression differences can provide raw material for evolution and to test the relative importance of drift and natural selection in gene expression differences between populations or species, we need the ability to study these processes in recently diverged or currently diverging species where ecological differences are known (Whitehead & Crawford 2006a, b;Fay & Wittkopp 2007). The pea aphid complex shows gene expression differences both in response to the environment and in relation to race, providing an appropriate model system in which to examine these questions. To understand whether gene expression differences are playing a role in the adaptation of pea aphid races to their host plants and in the divergence of host races, we must be able to show that there are observable gene expression differences between the races, that these differences are heritable and that they are associated with adaptive divergence and/or reproductive isolation.
Our experimental design, where each race was reared on Vicia faba and on the plant with which it is associated in the field, has allowed us to examine gene expression differences between races in a common garden, gene expression differences between aphids raised on their home plant in comparison with the universal host plant and the differences between races in the way they respond to different host plants. The confounding of race with home plant was necessary as pea aphids are highly stressed, if they survive at all, when reared on the home plants of other races. As a consequence, it is the case that differences between races in gene expression response to the shift from V. faba to the home plant could be due to race-specific changes in gene expression to the specific home plant (i.e. race specialization on the home plant), but they could also be the consequence of a general (nonhost-specific) change in gene expression in response to a host shift. However, our findings are still informative; genes that change expression in each shift are candidates for involvement in host adaptation and those that show race * host interactions are of particular interest because they are either host or race specific in their response.
Are there differences in gene expression between hostassociated races?
Between 1406 and 4322 genes (3.9-11.9% of the 36 401 genes examined) were significantly differentially expressed in each of the five remaining host races in comparison with the M. sativa-associated race (Fig. 3a).
Direct comparison with other studies of the extent of expression divergence between populations is complicated given the influence of demographic factors, as well as wide variation in the extent of expression divergence observed between different tissue types (Khaitovich et al. 2005). For comparison, Hoang et al. (2015) found that 2.7% of genes were differentially expressed between highly differentiated allopatric populations of Drosophila mettleri [F ST = 0.63-0.81 in pairwise comparisons with other populations (Hurtado et al. 2004)], while 20-30% of genes were differentially expressed between two closely related Rhagoletis species (Ragland et al. 2015). Bryk et al. found that between 8.4% and 19.3% of genes were differentially expressed between two mouse populations that had been diverging for around 3000 years, depending on method and tissue used (Bryk et al. 2013). The percentage of differentially expressed genes between pea aphid races thus seems to be within the realm of other findings.

Are gene expression differences heritable?
Numerous studies have demonstrated differences in gene expression between diverging populations (e.g. Steiner et al. 2007;Whiteley et al. 2008;Gagnaire et al. 2013), but differences in expression between populations do not necessarily reflect heritable genetic variation, they could result from a plastic response to differences between the environments of populations. It is possible to uncouple gene expression from environmental variation using common garden experiments (e.g. Lai et al. 2008;Hoang et al. 2015). Here, we were able to show that observed expression patterns were related to lineage at the host-race level; despite being reared on the same host, aphids showed host-race-specific patterns of gene expression (Fig. 2). Hierarchical clustering of samples on the basis of expression demonstrated that gene expression in the pea aphid is related to both race and clone and that expression similarities between aphids of the same race and clone persist whether aphids have been reared on their home plant or on the universal host plant V. faba. Maintenance of gene expression similarities on the basis of race even on the universal host suggests genetic control of gene expression in the pea aphid, a finding in agreement with observations in other taxa of expression variation across individuals, populations and species (e.g. Jin et al. 2001;Enard et al. 2002;Cui et al. 2006), and with eQTL studies, which have confirmed our understanding of gene expression differences as both extensive and heritable (Dixon et al. 2007;Emilsson et al. 2008;Gilad et al. 2008).
In pairwise comparisons between each host-associated race and the M. sativa-associated race, the more genetically the divergent a race was from the M. sativaassociated race, the more the genes were differentially expressed between the two races. In the absence of gene flow between taxa, under a neutral model of gene  (Broadley et al. 2008) as well as on a geneby-gene basis (Oleksiak et al. 2002;Khaitovich et al. 2004). Our observations are consistent with this model, and drift may be an important driving force in gene expression evolution in this system. However, as gene flow is ongoing between many of the races examined here (Peccoud et al. 2009a), some of the observed differences in overall expression patterns, which may have originally been driven by drift, may have been maintained in the face of gene flow by other factors such as selection.
Is differential expression associated with ecological differences between, and adaptive divergence of, hostassociated races?
The observed within-race changes in expression dependent on the plant that aphids were reared on (Table 2) may reflect a plastic response to host plant, a process which potentially performs an important role early in speciation by enabling persistence in novel plant environments (Price et al. 2003 p. 20;Pavey et al. 2010). Some of these plant-dependent genes could relate directly to adaptation of races to their host plant environments; 284 genes (16.9% of all unique genes differentially expressed on different plants) were also expressed differently between the race in which they showed a plant response and the M. sativa-associated race. As plant-response genes that also show race-specific expression in a common garden, they may contribute to the adaptive divergence of races. In expression comparisons between aphids reared on their home plant and those reared on V. faba, the universal host plant, we expected to see differences in genes related to numerous processes in the progressive stages of feeding, from sensing the plant, through metabolism to responses to plant defence. The GO category of odorant binding was over-represented amongst this gene set, and the three odorant binding genes responsible (OBP1, OBP4 and OBP7) were all upregulated in aphids reared on V. faba, possibly in response to unfamiliar odour molecules in their nonhome environment. A few other chemosensory genes were differentially expressed between home and 'Vicia' conditions, two sensory neuron membrane protein genes (ApisSNMP3 and ApisSNMP9) were also more highly expressed in the 'Vicia' condition, while two chemosensory protein genes (CSP2 and CSP9) were expressed more highly in aphids reared on their home plants. More chemosensory gene expression changes might be detectable if gene expression was compared between home and all nonhome host plants, rather than between the home plant and V. faba. Magnitudes of chemosensory gene expression changes between home and 'Vicia' conditions were also significantly higher than those of nonchemosensory genes in three races.
Differences in available metabolites between plants (Sandstr€ om & Pettersson 1994;Karley et al. 2002) might require aphids to express different digestive or metabolic enzymes, as well as to manipulate the metabolites and toxins that their host produces (Girousse et al. 2005). GO terms enriched in genes differentially expressed dependent on host plant included a number of metabolic categories, including catalytic activity, fatty acid metabolism and serine-and cysteine-type endopeptidase activity. Of particular interest are serine endopeptidases, which have been identified in other studies of insect feeding (De la Paz Celorio-Mancera et al. 2013;Hoang et al. 2015). GO categories including cysteine proteases, serine carboxypeptidases and genes with oxido-reductase activity were also enriched amongst genes differentially expressed between host plants and could relate to detoxification of reactive oxygen species (ROS) or other plant-produced toxic compounds.
Another set of genes important to plant-aphid interactions are the salivary genes, which are thought to be involved in the manipulation of plant defence (Dogimont et al. 2010;Hogenhout & Bos 2011). Note that there is no GO category assigned for salivary genes, so we do not expect to identify them in GO analysis. Nonetheless, the magnitudes of plant-induced expression changes in salivary genes were significantly greater than in nonsalivary genes. Furthermore, 58 individual salivary genes showed differential expression between aphids reared on their home host and those reared on V. faba, nearly 75% of which were upregulated on V. faba. These proteins may interfere with numerous facets of plant biology (e.g. cell signalling, secondary metabolite production and detoxification), and their differential expression could be the consequence of adaptation to specific host plants, where different quantities of salivary proteins may be required for specific compatible interactions. Alternatively, some salivary proteins may trigger undesired plant reactions in specific host plants, and it is possible that certain races suppress the expression of such genes to avoid triggering these responses in their host plant.
From the perspective of ecological speciation, the eight genes showing a significant interaction between plant and host race are potentially the most interesting category of differentially expressed genes, as they relate to race-specific host response. These eight genes tended to be downregulated on V. faba and may represent an interesting set of genes upregulated in response to certain host races. The inability to detect many genes differing between races in their response to host plant may arise from the fact that differences tend to occur in single races; when each race was examined in isolation, larger numbers of genes per race were identified (Fig. 3b, Table S3, Supporting information). However, the detection of so few race-specific plant-responsive genes is striking considering the experimental design; that the difference in environment between Vicia faba and 'home' plant is not constant across races argues for a strong interaction effect, and biological explanations for the dearth of race-specific plant-responsive genes must also be considered. Failure to identify these genes could be due to the overriding importance of constitutive changes in expression, as demonstrated by strong race differences irrespective of host plant. In their study of expression differences between two host-specialized populations of D. mettleri, Hoang et al. (2015) found that the majority of expression differences between populations were independent of genes responding to host plant and suggested that predictability of larval environment might mean that constitutive expression differences without plasticity were sufficient for larval success. It is possible that successful initial recognition of host plants by pea aphids removes the need to maintain flexibility of expression of genes involved in host plant adaptation.
The minimal association between plant-responsive genes and those differing in expression between races (chi-squared = 0.569, d.f. = 1, P = 0.45) may imply that much of the difference in gene expression between races is unrelated to host plant. GO terms associated with differentially expressed genes were very similar across races (Fig. S4, Supporting information), but while 41 GO categories were over-represented in differentially expressed subsets, there was little overlap in these over-represented categories between races. Combined with the steady, clock-like accumulation of differentially expressed genes with increasing genetic divergence between aphid races, these observations suggest that the majority of expression differences observed between races result from neutral processes. Alternatively, recent studies examining the contribution of plasticity to evolved differences have found mixed or minimal evidence for plasticity facilitating adaptive divergence (Dayan et al. 2015;Hoang et al. 2015;Ragland et al. 2015;Wybouw et al. 2015). If this is the case, betweenrace adaptive differences in expression could still exist, but the adaptive genes might be expected to be different from those showing a plastic response to host plant. As the divergence of pea aphid races relates directly to host plant shifts, it is still worthwhile considering whether any of the over-represented GO categories in genes differentially expressed between races might relate to evolved adaptive differences.
One interesting over-represented set of GO terms in genes differentially expressed between races is the group relating to chitin binding and glucosamine metabolism. Chitinases are commonly produced in plants in response to phloem feeding insects (Moran et al. 2002), and chitin expression in insects changes in response to insecticides (Puinean et al. 2010) and to diet (De la Paz Celorio-Mancera et al. 2013;Hoang et al. 2015). Differentially expressed chitin-related genes might alternatively reflect differences in development between races. As in genes responding to host plant, GO terms relating to fatty acid metabolism were also enriched in genes differing between races. Dworkin & Jones (2009) found that genes involved in fatty acid metabolism were expressed more highly in Drosophila sechellia than in Drosophila simulans, and related this difference to diet; D. sechellia feeds exclusively on morinda fruit, whose main toxins are fatty acids. Another gene potentially related to detoxification of plant defence compounds is ACYPI20394, a gene showing a race-specific response to host plant, which shows homology to peroxidase. Peroxidase enzymes have been commonly reported in aphid salivary secretions (Giordanengo et al. 2010;Carolan et al. 2011) and are thought to be involved in detoxification of plant defence compounds. However, ACYPI20394 is not identified as a salivary gene, and peroxidasins can also have a role in development.
Although there was no specific enrichment for chemosensory genes amongst differentially expressed subsets, 26 chemosensory genes (19 chemoreceptors, 2 SNMPs, 3 OBPs and 2 CSPs) were differentially expressed between races. Four chemosensory genes that were differentially expressed between races (Or15, Or18, Or20 and Or21) were also amongst the outlier genes identified in Smadja et al. (2012). Salivary genes had significantly higher expression differences than nonsalivary genes in pairwise race comparisons and were also over-represented amongst genes differentially expressed between M. sativa and P. sativum associated races. Interestingly, ACYPI008617, also known as C002, which was shown to be essential for aphid feeding on plants (Mutti et al. 2008), was highly expressed in the P. sativum associated race in comparison with expression in the M. sativa-associated race. C002 was also induced in the M. sativa-associated race when reared on V. faba in comparison with the same clones grown on their home plant (M. sativa).
As pea aphids tend to feed and reproduce on the same plant, host fidelity plays an important role in premating isolation (Caillaud & Via 2000). Genes involved in aphid-plant interactions therefore provide a potential link between adaptive divergence and reproductive isolation between races. Although we found almost no evidence for enrichment of salivary and chemosensory genes amongst all categories of differentially expressed genes, we did detect evidence for elevated magnitudes of expression changes in chemosensory genes in relation to plant and in salivary genes across all conditions. We also identified large numbers of genes from both categories with significant differences in expression across conditions, and the differential expression of these genes could have implications for aphid-plant interactions and reproductive isolation between races.
The virtual absence of enrichment for differential expression in candidate gene classes could result from host plant choice in pea aphids manifesting at a different life cycle stage (Gu et al. 2013); the aphids used here were wingless, but it is winged aphids that select their host plant when they disperse. Our choice of wingless aphids, which do not leave the plant they are born on, will be reflected in the kinds of genes observed as differentially expressed in this study. Expression differences relating to nutrition and feeding may be more relevant in wingless forms than those related to host plant choice. Examining winged instead of wingless aphids may reveal changes in chemosensory gene expression undetected here. Alternatively, expression differences related to plant adaptation could be confined to a small portion of candidate genes rather than a large group working together, or class level differences might have been masked by the divergence of expression in multiple traits, or by copy number variation in chemosensory genes between races as observed by Duvaux et al. (2015).
We conclude that heritable differences in gene expression exist between races of pea aphid and that races also differ in their transcriptomic response to the plant on which they are reared. Genes differentially expressed between races or environments include a number of candidate chemosensory and salivary genes, and genes relating to fatty acid metabolism are overrepresented amongst differentially expressed genes. Genes showing expression changes in response to host plant did not make up a large portion of between-race expression differences, providing confirmation of previous studies' findings that genes involved in expression divergence between populations or species are not necessarily those showing initial plasticity in the face of environmental change. Further exploration of gene expression in different conditions (e.g. tissues, morphs or environments) will be needed, in combination with studies of differentiation in allele frequency, to fully understand host-race formation and the progression towards speciation in this fascinating system.

Supporting information
Additional supporting information may be found in the online version of this article.

Fig. S1
Principal components analysis data on the 2500 genes showing the highest variance between samples. A regularized log transformation was applied on count data. 16.03% of variance is explained by PC1 and 12.68% by PC2. Black outline = reared on 'home' plant, no outline = reared on V. faba. + = sequencing batch 2, x = sequencing batch 3.

Fig. S2
Boxplots of count data (expressed as log2(counts + 1)) for all samples, before (top) and after (bottom) applying normalization factors. Whiskers denote the lowest and highest values within 1.5 9 IQR from the first and third quartiles, respectively, data beyond the end of the whiskers are outliers and are plotted as points.

Fig. S5
REVIGO gene ontology "tree maps" summarizing overrepresented biological process GO categories in genes differentially expressed (a) with respect to plant and (b) with respect to race. Each rectangle contains a single cluster representative, joined into "superclusters" of loosely related terms represented by different colours. Larger rectangles reflect more significant p-values (blast2GO Fisher's exact test results for GO category enrichment, FDR p < 0.05). Numbers in brackets denote number of unique sequences per category showing differential expression.

Fig. S6
REVIGO gene ontology "tree maps" summarizing overrepresented molecular function GO categories in genes differentially expressed (a) with respect to plant and (b) with respect to race. Each rectangle contains a single cluster representative, joined into "superclusters" of loosely related terms represented by different colours. Larger rectangles reflect more significant p-values (blast2GO Fisher's exact test results for GO category enrichment, FDR p < 0.05).

Table S1
Final 52 samples used for differential expression analysis Table S2 list of candidate salivary and chemosensory genes, with gene ID, gene category, and genome location (assembly v2.1) Table S3 significantly differentially expressed genes (adjusted P-value <= 0.05) for each contrast between conditions Table S4 Chemosensory genes differentially expressed (DE) between each race and the M. sativa-associated race Table S5 annotations and names for 11 412 genes receiving GO annotation (46 298 terms annotated) Appendix S1 Detection of differential expression using DESeq2