Host-associated differentiation in a highly polyphagous, sexually reproducing insect herbivore

Insect herbivores may undergo genetic divergence on their host plants through host-associated differentiation (HAD). Much of what we know about HAD involves insect species with narrow host ranges (i.e., specialists) that spend part or all their life cycle inside their hosts, and/or reproduce asexually (e.g., parthenogenetic insects), all of which are thought to facilitate HAD. However, sexually reproducing polyphagous insects can also exhibit HAD. Few sexually reproducing insects have been tested for HAD, and when they have insects from only a handful of potential host-plant populations have been tested, making it difficult to predict how common HAD is when one considers the entire species’ host range. This question is particularly relevant when considering insect pests, as host-associated populations may differ in traits relevant to their control. Here, we tested for HAD in a cotton (Gossypium hirsutum) pest, the cotton fleahopper (CFH) (Pseudatomoscelis seriatus), a sexually reproducing, highly polyphagous hemipteran insect. A previous study detected one incidence of HAD among three of its host plants. We used Amplified fragment length polymorphism (AFLP) markers to assess HAD in CFH collected from an expanded array of 13 host-plant species belonging to seven families. Overall, four genetically distinct populations were found. One genetically distinct genotype was exclusively associated with one of the host-plant species while the other three were observed across more than one host-plant species. The relatively low degree of HAD in CFH compared to the pea aphid, another hemipteran insect, stresses the likely importance of sexual recombination as a factor increasing the likelihood of HAD.


Introduction
Host-plants play an important role in the diversification of insect populations (Ehrlich and Raven 1964). While associated with different host-plant species, insect populations can experience different selection pressures that may create ecological barriers to gene flow (Pashley 1986;Feder et al. 1993;Nosil and Crespi 2006). Divergent selection on different host-plant species may result in adaptive traits responsible for reproductive isolation among hostassociated subpopulations. If reproductive isolation is maintained, this process may end up in the formation of genetically distinct host-associated lineages or host races (Diehl and Bush 1984;Bernays 1991;Carroll and Boyd 1992;Pappers et al. 2001;Dres and Mallet 2002). This phenomenon is commonly referred to as host-associated genetic differentiation (HAD) (Bush 1969;Abrahamson et al. 2001).
In recent years, there has been a growing interest in HAD and several studies have sought to investigate the phenomenon in a variety of insect species including specialist (Funk et al. 2002;Althoff et al. 2007;Hernadez-Vera et al. 2010;Heard 2012;Medina et al. 2012) and generalist insects (Dopman et al. 2002;Funk et al. 2002;Sword et al. 2005;Barman et al. 2012). Perhaps some of the best-studied cases of insect HAD are those involving apple maggot flies (Rhagoletis pomonella) on apples and hawthorns (Bush 1969;Feder et al. 1993;Forbes et al. 2010), species associated with goldenrods (Abrahamson et al. 2001;Eubanks et al. 2003;Stireman et al. 2005), pea aphids (Acyrthosiphon pisum) associated with plants in the Fabaceae family (Via 1999;Frantz et al. 2006; ª 2015 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. Peccoud et al. 2009), and stick insects (Timema cristinae) on redheart and chamise (Nosil et al. 2002;Soria-Carrasco et al. 2014). In all these insect species, genetically distinct lineages have been found on different host-plant species. In fact, the remarkable diversity of insects we see today could be the result of HAD (Ehrlich 1964;Farrell 1998;Abrahamson et al. 2001;Dres and Mallet 2002), making the study of HAD an important component in our understanding of the role of host-plant species in ecological speciation.
Level of intimacy with their hosts (i.e., whether an insect lives/feeds within plant tissues vs. externally) and the type of reproduction (i.e., sexual or asexual) are factors thought to influence the propensity of insects to exhibit HAD (Medina 2012). Much of what we know about HAD involves insect species with narrow host ranges (i.e., specialists) that spend part or all their life cycle inside their hosts, and/or reproduce asexually (e.g., parthenogenetically) (Pashley 1986;Van Zandt and Mopper 1998;Brunner et al. 2004;Medina 2010, 2012;Cook et al. 2012;Darwell et al. 2014;Marques et al. 2014). Pea aphids, for example, are parthenogenetic Fabaceae specialists that are composed of genetically distinct host-associated lineages on clover and alfalfa (Via 1999). Even though pea aphids are associated with multiple plant species (Via 1999;Simon et al. 2003;Ferrari et al. 2006;Frantz et al. 2006;Via 2009), it was not until Peccoud et al. (2009) sampled insects from an extensive number of different host-plant populations that HAD in pea aphids was found to be more extensive than previously thought. This raises the question of whether HAD is really uncommon in sexually reproducing generalists or perhaps has simply been overlooked due to limited sampling.
Evidence of HAD in sexually reproducing generalist species is accumulating. For example, grasshoppers and green mirids are polyphagous, feeding on multiple hosts from different families, yet they exhibit HAD (Sword and Dopman 1999;Dopman et al. 2002;Sword et al. 2005;Apple et al. 2010;Hereward et al. 2013). For agriculturally important pests, genetically distinct lineages on different host-plants may differ in their susceptibility to certain pest control methods. Thus, knowing which pest species show HAD is important. For example, conservation biological control may not work in a particular crop if natural enemies co-evolve with their insect hosts on one hostplant species and become reproductively isolated on alternative host plants (Eubanks et al. 2003;Forbes et al. 2010;Heard et al. 2006). Similarly, the use of alternative host-plant species as refuges in transgenic crop plantings may not work if host-associated populations of polyphagous pests are reproductively isolated when on different host-plant species (Calcagno et al. 2007). Although some sexually reproducing generalist pests (e.g., fall armyworm, browntail moth, green mirid) have been shown to exhibit HAD (Pashley 1986;Hereward et al. 2013;Marques et al. 2014), we currently do not know how widespread HAD is across the agroecosystems in which these pest species exist.
The cotton fleahopper (CFH), Pseudatomoscelis seriatus Rueter, (Hemiptera: Miridae) offers a good model to test HAD in a sexually reproducing generalist insect pest in a highly managed monoculture. CFH feeds on at least 160 host-plant species belonging to 35 different families of both managed crops and unmanaged wild plants (Snodgrass et al. 1984;Esquivel and Esquivel 2009). It feeds using its piercing-sucking mouthparts on anthers and young flower buds of developing plants. As an agriculturally important crop, cotton (Gossypium hirsutum) is most vulnerable to CFH attack during the first three weeks of early flower bud (referred as "squares") development (Sansone et al. 2009). Recently, Barman et al. (2012) tested for HAD in CFH when associated with three of its most abundant host-plants in Texas, USA: horsemint, Monarda punctata L.; woolly croton, Croton capitatus; and cotton. CFH on horsemint showed strong HAD in areas where annual precipitation was below 26 inches. Given that CFH is highly polyphagous, we predicted that HAD would be likely to occur on other host-plant species as well. To test this hypothesis, we used AFLP and Bayesian analyses to test for HAD among CFH collected from 13 different host-plant species belonging to seven plant families.

Cotton fleahopper sampling and host-plant identification
We sampled CFH from 13 host-plant species (belonging to 7 families), one of which is an annual crop (cotton) and 12 perennial plants (Table 1). Plant families sampled included: Asteraceae, Euphorbiaceae, Lamiaceae, Malvaceae, Onagraceae, Solanaceae, and Verbenaceae. CFHs were collected from 14 locations in Texas, spanning 13 counties distributed across multiple ecological regions from the Piney Woods in the east to Edwards Plateau in the west (Fig. 1). In addition to collecting CFH individuals, we collected plant samples from which the insects were collected as voucher specimens. Plants were individually pressed using standard plant press protocols (Queensland-Herbarium, 2013). Plants were identified to species by Dr. Dale Kruse (S. M. Tracy Herbarium, Department of Rangeland Ecology and Management, TAMU College Station, Texas). Cotton fleahopper sampling took place during the spring and summer of 2013 and 2014 when herbaceous plants had green foliage, some of which were blooming at the time of sampling. On cotton, CFH sampling coincided with the development of flower buds ("squares") when CFH numbers were typically high. Using hand-held sweep nets and aspirators, insects were sampled from cotton fields, wild vegetation patches surrounding cotton fields, open fields within natural forest stands, and along roadsides and highways. We initially planned to sample only CFH nymphs from each host plant; however, due to overall low nymph numbers on several of the host plants sampled, we also included adults in this study. In all, a total of 240 individuals were analyzed, ranging from 8 to 20 individuals per host-plant species. Individuals were stored in 80% ethanol prior to DNA extraction.

DNA extraction and AFLPs
Genomic DNA was extracted from whole insects using DNeasy â tissue extraction kit (QIAGEN, Valencia, CA) following the manufacturer's protocol and stored in AE buffer at À20°C. DNA concentration and quality were assessed using a NanoDrop spectrophotometer (Nano-Drop Technologies, Wilmington, DE). On average, DNA concentration and quality from individual CFH extractions were 100 ng/lL and 2.00, respectively. Amplified fragment length polymorphism (AFLP) reactions were performed following the protocol of Vos et al. (1995) with minor modifications by Barman et al. (2012). Briefly, aliquot of DNA from individuals was randomly assigned to a 96-well plate, repeating one control individual three times on each plate to assess reproducibility. A negative control (blank) was included in every plate to assess potential cross-contamination. A restriction digestion of 5.5 lL DNA and 5.5 lL of master mix containing 0.03 lL T4 DNA ligase (New England Biolabs (NEB), Ipswich, MA), 1.1 lL 10x T4 DNA ligase buffer, 1.1 lL 0.5 mol/L NaCl, 0.55 lL diluted BSA, 0.05 lL MseI (NEB), 0.05 lL EcoRI (NEB), 1 lL each MseI and EcoRI adapter pairs (Life Technologies, Carlsbad, California, USA), and 0.61 lL sterile distilled water was performed.
Reactions were incubated overnight after which they were diluted 17-fold, with 189 lL TE thin buffer. This was followed by a 20-lL total volume preselective PCR reaction mix consisting of 4 lL diluted DNA, 15 lL AFLP core mix (Life Technologies), and 1 lL AFLP amplification primers (Life Technologies). Selective PCR amplifications were performed in a 21-lL volume of 15 lL platinum supermix (Life Technologies), 4 lL of a 19-fold diluted preamplification reaction product, and one primer combination consisting of 1 lL MseI-CAT (Life Technologies) and 1 lL EcoRI-ACT (Life Technologies). All PCR amplifications were carried out in an ABI GeneAmp thermocycler (Life Technologies) using protocols from Barman et al. (2012). Reactants were prepared in a laminar flow hood. DNA and PCR reagents were added using filter tips to minimize the risk of cross-contamination. A 10.5 lL total volume consisting of 1 lL selective amplification PCR product, 9 lL HiDi formamide, and 0.5 lL ROX 400 size standard (Life Technologies) was used for electrophoretic analysis of selective PCR fragments. Samples were analyzed on an ABI 3730xl 96-capillary genetic analyzer (Applied Biosystems, Forest City, CA).

Genetic analysis
Amplified fragment length polymorphism fragments were analyzed with the genetic software GeneMarker v.2.6.3 (Softgenetics, State College, Pennsylvania, USA). Only loci with fragment sizes within 50-400 bp and florescent units of 100 or more were included in our analyses. Results from GeneMarker were converted into a binary matrix of presence (1) or absence (0) for each locus. Loci with fewer than 5% markers than the average number of markers per loci were removed from the dataset. Fragment amplification failed in 30 individuals that were accordingly removed from the dataset. To ascertain whether the number of individuals and the number of markers used in the study were sufficient to accurately predict genetic structure of CFH, we used the SESim statistic (Medina et al. 2006). A SESim value lower than 0.05 indicates that the number of loci and individuals in a dataset are sufficient and that additional markers or individuals may not alter the population clustering pattern produced by the sampled area under study (Medina et al. 2006). Allelic frequencies of AFLP fragments were estimated using the Bayesian method implemented in AFLP-SURV v.1.0 (Vekemans et al. 2002) with the nonuniform prior distributions of allele frequencies option. The Bayesian method of AFLP-SURV produces statistically unbiased estimates of genetic diversity and genetic distances (Zhivotovsky 1999). Allele frequencies were used to estimate overall F ST between host populations and pairwise F ST between host-plants using 100,000 permutations in ARLEQUIN (Excoffier and Lischer 2010). Significance of F ST was estimated with 10,000 permutations. Additionally, Nei's genetic distances between pairs of populations were estimated in AFLP-SURV using 10,000 permutations. Genetic diversity for each population was measured by estimating the number of polymorphic loci and Nei's gene diversity.
Genetic distances between pairs of populations were used for principal coordinate analysis (PCoA) in GenAlEx 6.5 (Peakall and Smouse 2012). An analysis of molecular variance (AMOVA) implemented in GenAlEx was also used to estimate hierarchical genetic structure within and among populations using host-plants and geographic location as source populations. Here, sampling locations were grouped by region, that is, east versus west Texas (Fig. 1), to reflect the potential effect of precipitation on genetic differentiation as outlined by Barman et al. (2012). We performed a five-part AMOVA: differentiation (1) among host-plants, (2) within host-plants, (3) among sampling regions, (4) among sampling locations within regions, and (5) within sampling locations. AMOVA calculates Φ PT , an analogue of F ST , using a squared Euclidean distance matrix between AFLP fragments. Φ PT is a band-based approach recommended for AFLP data because it does not depend on assumptions that underestimate genetic variability (Lynch and Milligan 1994;Yan et al. 1999). Genetic structure was further assessed with the Bayesian model-based clustering algorithm implemented in STRUCTURE v.2.3.4 (Pritchard et al. 2000;Falush et al. 2007). Clustering in this model was based on the assumption of admixed populations with independent allele frequencies. Sampling source (i.e., host-plant) was used as prior information to assist the clustering method.
A burn-in period of 10,000 and a run length of 10,000 Markov chain Monte Carlo (MCMC) iterations were performed for 20 runs for clusters (K) ranging from 1 to 14. Delta K (ΔK) was estimated based on Evanno et al. (2005) to select K with the highest probability of predicting population structure in the dataset. Given that populations with stronger structuring may hide structuring in other populations (Evanno et al. 2005), STRUCTURE was first run on the whole dataset, then rerun after consecutively removing populations with distinct clusters (i.e., scurvy mallow and horsemint) (Figures S1-S3).

Results
The primer pair used in this study yielded a total of 62 AFLP loci. A SESim statistic of 0.011 indicated that the number of loci and individuals in our dataset were sufficient to describe the population clustering pattern produced by CFH in our study (Medina et al. 2006). The percentage of polymorphic loci per host-plant ranged from 45% to 79% with scurvy mallow (SM) and both cotton (CT) and primrose (EP) yielding the lowest and highest polymorphisms, respectively ( Table 2). Estimates of Nei's genetic diversity were similar across host-plants with an average of 0.06 (SE = 0.003). Overall, genetic differentiation of host-plants based on F ST was low, but significant (0.07; P = 0.01). Pairwise F ST values among host-plants indicate that genetic differentiation was either absent or low among most hosts (Table 3). CFHs on scurvy mallow (SM), however, were genetically distinct when compared with all other hosts, with pairwise F ST estimates ranging from 0.29 to 0.32. Likewise, differentiation of horsemint (HM) was significantly different from other hosts, with pairwise F ST ranging from 0.06 to 0.16 (Table 3).
When host-plants from all locations were grouped together in the AMOVAs, genetic differentiation among host plants explained low but significant variation in CFH (7%), whereas much of the variation was explained within host-plants (i.e., 93%). When sampling locations were grouped by region (i.e., east vs. west Texas [ Fig. 1]) to reflect the potential effects of precipitation on genetic differentiation (Barman et al. 2012), AMOVA detected 96% variation within locations while variation among regions and variation among locations within regions explained only 0% and 3%, respectively (Table 4).
Genetic structure was further investigated with the Bayesian-based clustering algorithm in STRUCTURE. Using the complete dataset with individuals from all host plants across sampling locations, DK (Evanno et al. 2005) detected 4 genetically distinct genetic origins (Fig. 2). In agreement with our other analyses, the scurvy mallow cluster was genetically distinct from individuals collected from all other hosts, with a high probability (approximately >99%) of individual assignment. Individuals from the other host-plants displayed a mixed genotype that varied in relative composition on different host-plant species (Fig. 2). A separate analysis in STRUCTURE using only nymphs (from 8 host-plants on which nymphs were sampled) also differentiated CFH on scurvy mallow from those collected from other hosts. However, DK for nymphs indicated only three genetic origins (Fig. 3). Interestingly, nymphs from purple prairie (PP) belonged to only one genetically distinct population while adults on this plant belonged to three populations. Finally, principal coordinate analyses (PCoAs) 1 and 2 explained 82.6% of the genetic variation of CFH (Fig. 4). PCoA 1 separated CFHs from scurvy mallow (SM) and horsemint (HM) from CFHs collected from every other host, whereas PCoA 2 separated only CFH from horsemint.   Table 1) are indicated below and separated by black bars. Each colored bar represents an individual CFH with the proportion of color corresponding to the probability that an individual is a member of a particular cluster.  Table 1) are indicated below and separated by black bars. Each colored bar represents an individual CFH with the proportion of color corresponding to the probability that an individual is a member of a particular cluster.

Discussion
Given that HAD is known to occur in the CFH (Barman et al. 2012) and that it has an extensive host-plant range of over 160 plants, we predicted that expanded sampling for HAD would reveal additional instances of HAD. Our results provided limited support for our hypothesis. CFH exhibited slight, but significant, genetic structuring across multiple host-plant species. Although estimates of genetic differentiation were low, host-plants explained a higher proportion of genetic variation in CFH than geographic location (Table 4). The first study of HAD in the CFH by Barman et al. (2012) included only three host-plant species. In this study, we expanded the assessment of HAD by analyzing CFHs collected from 13 host-plant species. Results from our study identified 4 genetically distinct populations of CFH associated with 13 host-plant species (Fig. 2), of which two were distinctly associated with a specific host-plant. CFHs from scurvy mallow were genetically distinct from CFHs from any other host (Figs. 2-4; Table 3). CFHs collected from horsemint were also differentiated when compared with CFHs collected from other host-plants ( Fig. 4 and Table 3). However, CFH genotypes on horsemint, although differentiated, were not unique to horsemint (Fig. 2). In their study, Barman et al. (2012) found that CFH associated with horsemint showed strong HAD but the pattern of differentiation was typical of a "geographic mosaic of HAD". In other words, horsemint populations in west Texas displayed a strong pattern of HAD, but in east Texas HAD was absent. Barman et al. (2012) speculated that the patchy distribution of horsemint in west Texas relative to the plant's almost continuous distribution in east Texas could potentially explain the differential presence of HAD in these two regions. Our study examined CFH in west Texas populations not only on horsemint and cotton (as in the Barman et al. 2012 study), but also on other uncultivated plants (e.g., silverleaf nightshade, purple prairie, old plainsman, and common hore-hound), revealing that CFH genotypes found on horsemint were not uniquely associated with this plant species. That is, the horsemint genotype characterized by Barman et al. (2012) was also present in other uncultivated plant species. Both Barman et al. (2012) and this study examined adults on horsemint. However, future studies should genetically characterize nymphs (see below for further discussion about nymphs) on horsemint and compare them with nymphs from uncultivated vegetation. Comparing genetic population structure of nymphs versus adults from different host-plant species will increase our understanding of CFH host-plant fidelity, mating, and dispersal behavior.
Populations on scurvy mallow were genetically distinct from those on the other hosts tested in this study. The scurvy mallow plants we sampled were in close proximity with old plainsman, silverleaf nightshade, common horehound, and purple prairie. In order for host-related selection to cause divergence among populations in such close proximity, there has to be sufficient reduction in gene flow among populations associated with these different host-plants (Geiselhardt et al. 2012). If divergent selection on CFH associated with scurvy mallow is linked to mating and/or oviposition preference, then selection may have favored assortative mating on scurvy mallow facilitating HAD.
In pea aphids, HAD was first reported in populations associated with alfalfa and red clover (Via 1999;Leonardo and Muiru 2003). Later, another distinct lineage of pea aphids was found on populations associated with pea and faba bean (Carre and Bournoville 2003;Simon et al. 2003;Frantz et al. 2006). After testing for HAD on 19 widely distributed plants, Peccoud et al. (2009) found 11 distinct host-associated lineages of pea aphids in Western Europe. In the case of the highly polyphagous CFH, when Barman et al. (2012) tested HAD on three host-plants, they detected one host-associated lineage. In our study, extending the number of host plants did not dramatically increase the incidence of HAD, suggesting that compared to pea aphids, HAD is rather uncommon in CFH. The scarcity of HAD in CFH compared to pea aphids may at least partly be explained by differences in their mode of reproduction.
It has been proposed that parthenogenesis may increase the incidence of HAD (Medina 2012). In fact, several HAD case studies involve parthenogenetic organisms such as pea aphids (Via 1999), grain aphids (Simon et al. 1999;Vialatte et al. 2005), yellow pecan aphids (Dickey and Medina 2010), western flower thrips (Brunner et al. 2004;Brunner and Frey 2010), and eriophyid mites (Evans et al. 2013). However, HAD also occurs in sexually reproducing insects such as grasshoppers (Dopman et al. 2002;Sword et al. 2005;Apple et al. 2010), green mirids (Hereward et al. 2013), fall armyworms (Pashley 1986), and brown tail moths (Marques et al. 2014). Unfortunately, in all these cases HAD was tested across only a handful of host-plant species, making it impossible to know whether HAD extends beyond the sampled plants. We predict that HAD in sexually reproducing insect herbivores will parallel the pattern we have found in the CFH. That is, HAD will be present in a rather small proportion of host-plants. On the contrary, HAD in parthenogenetic herbivores is expected to be present in several of their host-plants, as it has already been reported in the pea and cotton aphids (Vanlerberghe-Masutti and Chavigny 1998;Ferrari et al. 2006;Peccoud et al. 2009).
To test whether dispersing adults were confounding the population structure found in this study, we used only nymphs (due to their relatively low dispersibility) in a separate STRUCTURE analysis. Although the analysis of nymphs did not dramatically change the overall pattern of HAD in CFH (Fig. 3), it made it less "noisy". Interestingly, nymphs on purple prairie harbored only one genotype ( Fig. 2) while adult populations harbored three (Fig. 2). All other plants, except for scurvy mallow, supported two genotypes when only nymphs were considered (Fig. 3). The differences observed in the analyses of nymph and adult genetic population structure could be explained by adult dispersal among host-plant species.
Host-associated differentiation of CFH populations may have practical implications for pest control. The fact that genotypes found in cotton can also be found in nearby uncultivated vegetation suggests that several native hosts-plants act as sources of CFH in cotton fields. However, some host-plant species such as scurvy mallow and horsemint harbor CFH genotypes that are genetically distinct and may not contribute to building up pestiferous populations in cotton. This same phenomenon has been observed in wheat where populations of cereal aphid, Sitobion avenae, associated with wild vegetation do not contribute to the buildup of pestiferous populations in wheat (Vialatte et al. 2005). Thus, plants such as scurvy mallow and horsemint could be considered as plants suitable to use in conservation biological control programs to enhance local CFH natural enemy populations. Interestingly, CFH populations in horsemint have been found to be genetically distinct only in west Texas. Populations of CFH in east Texas are identical to CFH populations in cotton (Barman et al. 2012). Geographic variation in the pattern of HAD stresses the need to study pests' population structure across their entire geographic distribution and host range. Genetic population structure of pest species may inform locally adapted control strategies in area-wide integrated pest management (IPM) programs.

Supporting Information
Additional Supporting Information may be found in the online version of this article: Figure S1. Structure output when individuals from scurvy mallow (SM) were removed from the analysis. Figure S2. Structure output when individuals from horsemint (HM) were removed from the analysis. Figure S3. Structure output when individuals from horsemint (HM) and scurvy mallow (SM) were removed from the analysis. Figure S4. Cotton fleahopper feeding on one its wild host-plants.