Intragenomic polymorphisms among high-copy loci: a genus-wide study of nuclear ribosomal DNA in Asclepias (Apocynaceae)

Despite knowledge that concerted evolution of high-copy loci is often imperfect, studies that investigate the extent of intragenomic polymorphisms and comparisons across a large number of species are rarely made. We present a bioinformatic pipeline for characterizing polymorphisms within an individual among copies of a high-copy locus. Results are presented for nuclear ribosomal DNA (nrDNA) across the milkweed genus, Asclepias. The 18S-26S portion of the nrDNA cistron of Asclepias syriaca served as a reference for assembly of the region from 124 samples representing 90 species of Asclepias. Reads were mapped back to each individual’s consensus and at each position reads differing from the consensus were tallied using a custom perl script. Low frequency polymorphisms existed in all individuals (mean = 5.8%). Most nrDNA positions (91%) were polymorphic in at least one individual, with polymorphic sites being less frequent in subunit regions and loops. Highly polymorphic sites existed in each individual, with highest abundance in the “noncoding” ITS regions. Phylogenetic signal was present in the distribution of intragenomic polymorphisms across the genus. Intragenomic polymorphisms in nrDNA are common in Asclepias, being found at higher frequency than any other study to date. The high and variable frequency of polymorphisms across species highlights concerns that phylogenetic applications of nrDNA may be error-prone. The new analytical approach provided here is applicable to other taxa and other high-copy regions characterized by low coverage genome sequencing (genome skimming).


INTRODUCTION
With the advent of DNA sequencing technology to infer phylogenetic relationships, investigators began searching for genetic loci that were both phylogenetically informative and readily sequenced in most organisms. The use of nuclear ribosomal DNA (nrDNA) soon became a popular choice for phylogenetic inference (Hamby & Zimmer, 1988;Hillis & Dixon, 1991;Baldwin, 1992;Baldwin et al., 1995;Álvarez & Wendel, 2003). Nuclear ribosomal DNA offered several advantages over other loci: the combination of highly conserved and variable regions allowed phylogenetic inference across a broad range of evolutionary time scales; conserved regions allowed the use of "universal" PCR primers applicable to a wide range of taxa; the high copy number of nrDNA repeats allowed reliable amplification from lower quality DNA extractions; and the process of concerted evolution ensured that these copies were similar within individuals (Baldwin et al., 1995). The use of nrDNA, particularly the variable internal transcribed spacer (ITS) regions, became widespread, to the extent that many studies were based exclusively on ITS data (Álvarez & Wendel, 2003).
However, nrDNA loci have been shown to harbor limitations in their phylogenetic utility. Nuclear ribosomal DNA copies are assembled as tandem repeats at one or more loci in the genome, with each locus being known as an array. The number of repeats present within an array is labile, as is the number and location of arrays (Álvarez & Wendel, 2003). The process of nrDNA copy homogenization from homologous recombination or unequal crossing over is thought to occur much more frequently within than among arrays (Schlötterer & Tautz, 1994). Thus, differing nrDNA alleles may become fixed in different arrays within a genome, creating paralogy that, if unrecognized, may confound phylogenetic inference (Álvarez & Wendel, 2003;Song et al., 2012). Moreover, these events can create pseudogenes which, freed from selective pressures, may evolve through processes quite different from the functional loci and provide misleading evidence for between-individual genetic divergences if compared to functional copies (Buckler, Ippolito & Holtsford, 1997). These events may occur at a greater rate than inter-array homogenization via concerted evolution (Karvonen & Savolainen, 1993;Gernandt & Liston, 1999).
Due to the technical difficulty of systematically sequencing individual nrDNA loci because of their high copy number, studies characterizing the abundance and patterns of intragenomic nrDNA polymorphisms have been rare. Recently, studies utilizing wholegenome shotgun sequencing have begun to reveal levels of intragenomic polymorphism in Drosophila (Stage & Eickbush, 2007), nematodes (Bik et al., 2013), and fungi (Ganley & Kobayashi, 2007). However, these studies included a small number of species (12, 6, and 5, respectively) and did not attempt to place patterns of polymorphism in a phylogenetic context. Song et al. (2012) examined the ITS2 region of 178 plant species via pyrosequencing, finding nearly ubiquitous intragenomic variation, with most ITS2 copies within a genome represented by a few major variants. Other studies have used intragenomic nrDNA polymorphisms to identify populations of Arabidopsis (Simon et al., 2012) and infer intraspecific phylogenies of Saccharomyces (West et al., 2014). Studies of intragenomic nrDNA polymorphism patterns across many species within the same genus have not been performed in plants (but see Straub et al., 2012).
This study utilizes high throughput technology to survey many species and individuals in the angiosperm genus Asclepias (Apocynaceae) in order to characterize levels of intragenomic nrDNA polymorphism and place these within a phylogenetic context. The methods presented here are expanded from those we have previously developed as part of the Milkweed Genome Project (Straub et al., 2011;Straub et al., 2012), and generalized for use with a large number of taxa and any high-copy locus, such as those that may be obtained from a genome-skimming or Hyb-Seq study (Straub et al., 2012;Weitemier et al., 2014).

Sampling and sequencing
One hundred twenty-five individuals representing 90 Asclepias species and subspecies were sampled (Table 1) and sequencing libraries were produced as described in Straub et al. (2012). Two individuals of putatively hybrid origin were included: A. albicans × subulata and A. speciosa × syriaca. These individuals were collected from wild populations and identified as hybrids through expression of intermediate morphological characteristics (M Fishbein, 1996(M Fishbein, , 1998see also Fishbein et al., 2011). Samples were multiplexed in approximately equimolar ratios, with up to 21 individuals per lane, and sequenced with 80 bp single-end reads on an Illumina GAIIx instrument (Illumina, San Diego California, USA). Asclepias subverticillata was multiplexed in a lane with 32 samples and sequenced with 101 bp paired-end reads on an Illumina HiSeq 2000 instrument, with reads analyzed as though they were single-end. One individual of A. syriaca was sequenced at higher coverage: this individual was sequenced in a single lane on an Illumina GAIIx with 40 bp single-end reads (Straub et al., 2011). To allow more efficient assembly downstream, read pools were filtered to remove plastid reads (using the custom script sort fastq v1.pl modified to retain Ns; Knaus, 2010).
An A. syriaca haploid genome size estimate of 420 Mbp and a nrDNA copy number estimate of 960 were used for estimates of sequencing depth and for comparisons with other organisms. These estimates are modified from Straub et al. (2011), where an incorrect estimate of the average A. syriaca 2C value led to a haploid genome size estimate of 820 Mbp and a nrDNA copy number estimate of 1,845. The current values are based on 2C estimates from Bai et al. (2012) and Bainard et al. (2012).

Polymorphism quantification
The method for determining polymorphisms present among nrDNA copies within an individual while retaining information about position homology across a group of distantly related individuals included four general steps, detailed below: (1) A sequence was selected to serve as a reference for the whole group; (2) A consensus sequence was obtained for each individual taxon and aligned against the group reference, allowing tailored read-mapping for each individual while associating positions along the individual consensus with their homologous positions in the group reference; (3) Reads for each individual were mapped onto that individual's consensus sequence; (4) At each position the reads differing from the individual consensus were tallied.

Group reference
The nrDNA cistron of the high-coverage A. syriaca individual was previously assembled (Straub et al., 2011;GenBank JF312046). The nontranscribed spacer and external transcribed spacers from each end were removed due to the presence of internal repeats,

Individual consensus sequences
Read pools were examined prior to consensus assembly and reads that were exact duplicates were reduced to a single representative, retaining the highest average quality score (using the custom script fastq collapse.py, available at https://github.com/listonlab). Sequences for each individual were constructed via reference-guided assembly, with A. syriaca as the reference, using Alignreads ver. 2.25 (Straub et al., 2011). Alignreads is a pipeline that includes the short-read assembler YASRA (Ratan, 2009), utilities from the MUMmer ver. 3.0 suite (Delcher et al., 2002;Kurtz et al., 2004), and custom scripts. Parameters were selected to ensure high identity of reads mapping to the nrDNA reference (95%), but allow the reconstructed sequence to differ from the reference. In addition to assembling the individual consensus sequence, Alignreads outputs a file associating each position in the group reference with those in the individual consensus.

Read mapping
Prior to mapping reads from an individual onto its consensus, reads with an average quality (Phred) score below 20 were removed, and bases in the remaining reads with a score below 20 were converted to Ns using FASTX-Toolkit ver. 0.0.13 (Gordon, 2008). Read mapping was performed with the program BWA ver. 0.5.7 , and output files processed with the SAMtools ver. 0.1.13 utilities . Reads were mapped onto the consensus sequence using the default mapping parameters in BWA. These allow up to 3 mismatches against the consensus in an 80 bp read and 4 mismatches in a 100 bp read, with long insertions or deletions excluded. In order to test the effect of relaxed mapping parameters on the abundance of polymorphisms detected, reads were mapped allowing 4 or 5 mismatches in an 80 or 100 bp read, respectively (the -n flag in bwa aln set to 0.015). The abundance of intragenomic indels was found by mapping reads using the default mismatch parameters, but allowing indels up to 5 bp long (bwa aln -e 4).

Polymorphism counting
A perl script was developed to tally the number of reads differing from the consensus at each position (polymorphic read counter bwaPileup.pl ver. 3.03b, available at https:// github.com/listonlab). For example, a base covered by 10 reads might have 7 reads with a G in that position and 3 with a C. In this case the consensus would have called a G at that position, with 30% of the reads differing. See File S1 for exact parameters and a pipeline of commands used.
Positions with 2% or more of reads differing from the consensus base were considered polymorphic. This cutoff is the same used by Straub et al. (2011) and comparable to that used by Nguyen et al. (2011) under a similar quality-filtering scheme. The control PhiX lane of the higher coverage A. syriaca individual was examined by Straub et al. (2011, Additional file 1 from that study) and found to have an error rate much less than 2%, indicating that the cutoff used here may be somewhat conservative. In addition to counting positions that were polymorphic, positions were recorded as "highly polymorphic" if 10% or more of the reads differed from the consensus.
In order to keep homologous bases aligned across individuals, only those positions that were present in the A. syriaca (group) reference were kept in the analysis (i.e., insertions relative to the reference were discarded). Note that deletions (relative to the reference) fixed within an individual are also not considered because zero reads called a base at that position.

RNA structure determination
The secondary structure of each subunit and spacer region was predicted for the A. syriaca reference from the minimum free energy structure found by the program RNAfold for the 18S, ITS1, and ITS2 regions and RNAcofold for the 5.8S + 26S regions (Lorenz et al., 2011). Program default model parameters were used (37 • C, unpaired bases can participate in up to one dangling end). Positions along the cistron were then categorized as either paired (stems) or unpaired (loops). Predicted structures are provided in File S2.

Polymorphisms within the cistron
The effects of cistron position (subunit or spacer) and secondary structure position (stem or loop) on the likelihood of a position being polymorphic or highly polymorphic were assessed in two ways. Differences in the likelihood that at least one individual was polymorphic at a position (e.g., positions coded as either polymorphic or invariant) were assessed via a two factor multiple logistic regression, as implemented in R ver. 3.1.0 using the MASS ver. 7.3.33 package (Venables & Ripley, 2002;R Core Team, 2014). Differences in the abundance of polymorphic individuals at a position were assessed using square-root transformed data with a two-way ANOVA and type III sum of squares for unbalanced design, as implemented using the car ver. 2.0.20 package (Fox & Weisberg, 2011). The ANOVA analysis is not reported for the highly polymorphic individuals because the data diverge substantially from assumptions of a normal distribution.

Phylogenetic context
A maximum likelihood estimate of phylogenetic relationships within Asclepias was produced by Fishbein et al. (2011, Fig . 2, TreeBase #27576). This tree was pruned to match the sampling in this study, and counts of polymorphic positions were recorded for each taxon. Taxa sampled in this study, but not present in Fishbein et al. (2011), were omitted from further analyses. Counts were averaged for taxa with multiple individuals in this study, but sampled only once in Fishbein et al. (2011). Ancestral states for the number of polymorphic positions in the rDNA cistron were reconstructed using squared-change parsimony in the Mesquite phylogenetic suite ver. 2.75 + build 573 Maddison, Maddison & Midford, 2011).
The phylogenetic signal in the distribution of the number polymorphic positions was tested in two ways. In the first method, the total length of the tree (parsimony steps) in terms of changes in number of polymorphic positions was compared to a distribution of tree lengths created by 10,000 permutations of polymorphic positions across tips. In the second method, a likelihood ratio test (LRT) was performed between models where character evolution followed a Brownian motion model across the tree. In the first model, the parameter lambda (describing how well the phylogeny correctly predicts the covariance among taxa for a trait) was found that maximized the model's likelihood (Pagel, 1999). This was compared to the likelihood found when lambda was held at zero, representing phylogenetic independence among species for that trait. Parsimony permutations were performed in Mesquite Maddison, Maddison & Midford, 2011), and likelihood ratio tests were performed in R with the phytools ver. 0.4.05 and ape ver. 3.1.2 packages (Paradis, Claude & Strimmer, 2004;Revell, 2012;R Core Team, 2014).
To determine if any clades held intragenomic polymorphism frequencies that were significantly high or low relative to the rest of the phylogeny, polymorphism rates were simulated along the tree, and the true polymorphism counts compared to the distribution of simulated counts (Garland et al., 1993). Two tips in the tree separated by zero branch length (A. asperula ssp. asperula and ssp. capricornu) were collapsed, and the average of the polymorphism counts for the four sampled individuals of the species used for the new tip. A model of trait evolution under Brownian motion, using the lambda parameter estimated from the LRT above, was found from 10,000 random starting points using the fitContinuous function from the geiger ver. 2.0.3 R package (Harmon et al., 2008). This model was used to simulate polymorphism counts across the phylogeny 10,000 times, with lower and upper bounds of 0 and infinity, respectively, using the fastBM function in phytools ver. 0.4.31 (Revell, 2012). For each node, the ancestral state was estimated for the true data and the simulated data using squared change parsimony as implemented in Mesquite Maddison, Maddison & Midford, 2011).

RESULTS
Average coverage of the nrDNA region was ∼97× for all individuals (median = 88×; Sequence Read Archive PRJNA261980). Despite high overall coverage of the nrDNA region, not all positions of the cistron were assembled for all individuals; therefore results for polymorphic positions are presented both as counts and as percentages of sequenced bases. The A. syriaca reference had total genome coverage of ∼0.8× (Sequence Read Archive SRP005621).
The length of the reference A. syriaca nrDNA cistron was 5,839 bp. The consensus sequences for the nrDNA cistrons of other samples ranged from 5,815 to 5,865 bp, with over half of samples having lengths between 5,836 and 5,842 bp. Relative to A. syriaca, 87 samples include at least one inserted position in their consensus sequence, and 117 samples have at least one deleted position. However, because in general more positions were inserted than deleted, 71 samples have lengths greater than 5,839, and 41 samples have shorter lengths (Fig. S1).

Intragenomic polymorphism
All individuals were polymorphic at several positions homologous with the A. syriaca reference ( Table 1) , and as either paired (stems) or unpaired (loops). More individuals are likely to be polymorphic at sites that are in spacer regions over subunit regions, and that are paired over unpaired. Bold values indicate categories that significantly affect polymorphism abundance (P < 0.05). (5.77%, Fig. 1). A very high percentage (91%) of positions in the A. syriaca reference were polymorphic in at least one individual ( Fig. 2A).

Source of variation
Positions were significantly more likely to be polymorphic if they were in a spacer region (ITS1, ITS2; Fig. 2A) or stem (Fig. 3A). This is true both when considering the number of individuals polymorphic at a position (Table 2), and whether any sample was polymorphic at that position (Table 3A).

Relaxed read mapping
Allowing more mismatches when mapping reads to their consensus nrDNA sequence increased the polymorphic sites counted within individuals by an average of 15.5% (Fig. S2). Two samples had no change in their polymorphism abundance, and two had a decrease (i.e., newly mapped reads at a previously polymorphic position matched the consensus, thereby dropping the polymorphic reads below 2%). The increase in polymorphism abundance under relaxed read mapping may indicate that the standard read mapping parameters are too conservative and that some truly polymorphic sites are excluded. However, because standard read mapping is more likely to exclude reads Probability that at least one individual is (A) polymorphic or (B) highly polymorphic at a position that is either within a spacer (ITS1, ITS2) or subunit region (18S, 5.8S, 26S), and either paired (stems) or unpaired (loops). Error bars indicate 95% confidence intervals. Values derived from two-factor multiple logistic regressions (Table 3). Table 3 Multiple logistic regression of the likelihood of nrDNA position polymorphism, by position type. Two-factor multiple logistic regression of the likelihood of nrDNA positions being (A) polymorphic or (B) highly polymorphic in at least one individual. Positions are categorized as either within a subunit (18S, 5.8S, 26S) or spacer region (ITS1, ITS2), and as either paired (stem) or unpaired (loop). Odds ratios indicate whether a category decreases (<1) or increases (>1) the likelihood a position is polymorphic or highly polymorphic. The intercept represents paired, spacer positions. Categories that significantly affect polymorphism likelihood are indicated by italics (P < 0.1) or boldface (P < 0.05). Polymorphism probabilities for each category are presented in Fig. 3.

Source
Odds ratio 95% CI Coefficient estimate Std. error Z-value P

Notes.
Parsimony permutations, the proportion of permutations with a shorter tree length than the true data; Lambda, the maximum likelihood estimate of lambda; logL, the log-likelihood ratio of the unconstrained model including the estimated lambda over the constrained model with lambda = 0; P, the probability of obtaining a likelihood ratio this small or smaller by chance alone.
containing sequencing errors, and because there is a strong linear correlation (R 2 = 0.97) between polymorphism abundance under the two mapping schemes, remaining analyses will only consider results from the standard read mapping.
Samples contained a mean of 7.6 and a median of 5 polymorphic indels when mapped reads were allowed to contain insertions or deletions of up to 5 bp relative to the individual consensus sequence. Eight individuals exhibited no polymorphic indels, including a sample of A. tuberosa ssp. rolfsii (Lynch 12526 [OKLA]). However, a sample of A. tuberosa ssp. interior (Fishbein 2816 [OKLA]) contained the most polymorphic indels at 51. Intragenomic indel abundance was positively correlated with SNP abundance (R 2 = 0.35, Fig. S3). Due to the generally low level of intragenomic indel polymorphisms (78% of samples contained 10 or fewer), remaining analyses only consider results from intragenomic polymorphic SNPs. Polymorphic indels and SNP counts under relaxed mapping for each sample are provided in File S3.

Phylogenetic signal
The number of polymorphic base pair positions exhibited strong phylogenetic signal across Asclepias under both the permutation test (P < 0.0012) and the likelihood ratio test (estimated lambda = 0.51, P = 0.0067; Table 4; Fig. 4). This signal remained even after the ITS regions were removed from the dataset (permutation test P < 0.0200, lambda = 0.45, LRT P = 0.0112). Highly polymorphic base pair abundance, however, was not significantly influenced by phylogenetic history (Table 4; Fig. 5), even when considering only the subunit regions.
Of the 53 resolved clades in the phylogeny, none showed polymorphism values more extreme than expected under a Brownian motion model after correcting for multiple comparisons. The most extreme clade was that formed by A. hypoleuca and A. otarioides, which had an ancestral number of polymorphic positions exceeded by 79 of the 10,000 simulations (when excluding the spacer regions this node was exceeded by 60 simulations). The following most extreme nodes were those ancestral to A. rosea and A. lemmonii with more polymorphic positions than all but 130 (79) simulations, and ancestral to The tree topology is that pruned from Fig. 2 of Fishbein et al. (2011) with clades indicated by letters, following that study. A. boliviensis and A. mellodora with more polymorphic positions than all but 246 (150) simulations. Under a Bonferroni correction a clade would require 9 or fewer simulations more extreme than the observed value (α = 0.05) to reject a hypothesis of no divergence from Brownian motion.
Despite the strong phylogenetic signal of polymorphism abundance across Asclepias, counts among samples within species (for those species with multiple samples) exhibited variability. Some samples of the same species had very similar polymorphism counts (e.g., the two A. jaliscana individuals contained 140 and 165 intragenomic polymorphisms), while others differed dramatically (e.g., A. macrosperma individuals contained 76 and 391).

Identification of mixed ancestry
The number of polymorphic sites for hybrid individuals, 299 for A. albicans × subulata and 161 for A. speciosa × syriaca, are less than the mean number of 333 polymorphic sites; and the number of highly polymorphic sites, 12 and 37, are less than or greater than the mean of 28. Of those positions that are highly po lymorphic, 4 of 37 in A. speciosa × syriaca have a minor allele frequency of 0.3 or higher, while none of the positions in A. albicans × subulata have a minor allele frequency above 0.2.

DISCUSSION
Absolute counts of intragenomic polymorphisms among the copies of nrDNA in Asclepias (mean = 333 positions) were found to be much higher than levels reported for nematodes (<250; Bik et al., 2013), fungi (3-37; Ganley & Kobayashi, 2007), and Drosophila (3-18; Stage & Eickbush, 2007) when including all polymorphic positions. When considering only positions that are highly polymorphic, Asclepias exhibits slightly higher rates (mean = 28.4 positions) than fungi and Drosophila, but much lower rates than nematodes. However, these comparisons may be misleading: First, the number of nrDNA copies varies greatly between these taxa, estimated to range from about 50-180 in the fungal species, 200-250 in Drosophila melanogaster, 56-323 in the nematodes, and about 960 in Asclepias (Ganley & Kobayashi, 2007;Stage & Eickbush, 2007;Straub et al., 2011;Bik et al., 2013). Second, polymorphic base pair counts are confounded by differing criteria for scoring polymorphism (i.e., methods for excluding sequencing errors). The levels listed for the fungal species include both "high-confidence" and "low-confidence" polymorphisms, based primarily on sequence quality (Ganley & Kobayashi, 2007). The levels listed for Drosophila are polymorphisms present in ≥3% of loci (Stage & Eickbush, 2007). Bik et al. (2013) tallied read counts using a method similar to the method presented here, but called positions polymorphic when the count of differing reads exceeded what would be expected for a single copy locus. Third, sequencing depths of the fungal and Drosophila studies were much lower than those used here and with the nematodes (Ganley & Kobayashi, 2007;Stage & Eickbush, 2007;Bik et al., 2013). Nevertheless, given that Asclepias has absolute counts of polymorphic positions at least 33% higher than the other organisms studied, and that the sequencing depth was nearly two orders of magnitude greater in the nematodes than in Asclepias (6.3-10× per nrDNA copy in nematodes, vs. ∼0.1× in Asclepias; Bik et al., 2013), it is likely that Asclepias harbors greater rates of intragenomic polymorphism within the nrDNA cistron than the organisms studied to date.

Polymorphism patterns across the nrDNA cistron
Spacer regions (ITS1, ITS2) had higher frequencies of polymorphic positions than subunit regions (18S, 5.8S, 26S; Fig. 2). However, positions with low polymorphism frequencies are distributed much more evenly across the nrDNA cistron ( Fig. 2A) than highly polymorphic positions, which show strong differentiation between the subunit and spacer regions (Fig. 2B). The lower frequency of highly polymorphic positions within the subunit regions suggests that these regions are under selection to remain homogenous within individual genomes. The lower difference in low polymorphism frequency between subunit and spacer regions suggests that this selection pressure is positively correlated with the proportion of nrDNA copies that differ from the majority. These findings contrast with those reported for nematodes (Bik et al., 2013), where the subunit regions had much higher levels of polymorphism abundance than the spacer regions.
Positions in stem regions were more likely to be polymorphic than loop positions (Fig. 3). This was strongly significant for all polymorphic positions (Tables 2 and 3A) and moderately significant for highly polymorphic positions (Table 3B). This would seem to contradict the hypothesis that stem sites in general should be more highly conserved in order to maintain a functional RNA secondary structure. Indeed, this finding agrees with those from Rzhetsky (1995), who not only found that trees estimated from stem regions contained longer branch lengths than those from loop regions, but that those stem sites least likely to affect secondary structure tended to be less conserved. Loop sites, on the other hand, contain a large proportion of the sites critical to ribosomal function (Rzhetsky, 1995), and may be under stronger stabilizing selection than stem sites.
Among the subunit regions, the 26S region harbored the highest frequency of highly polymorphic positions. This agrees with Stage & Eickbush (2007) who showed the Drosophila 28S region to have a higher mutation rate than the other subunit regions. However, in that study the 28S region also had a lower frequency of polymorphic base pairs. Stage & Eickbush (2007) hypothesize that this is due to the action of two retrotransposable elements found in many Drosophila 28S copies, with instances of aborted insertions causing the cell to repair the region using a nearby template and thereby homogenize the copies. They predict that levels of polymorphism will be higher in the 28S region of organisms lacking the retrotransposable elements, as seen here in the homologous angiosperm 26S region.
Within the subunit regions, highly polymorphic positions may be more common within expansion regions of the RNA gene (less conserved regions that tend to incorporate sequence insertions without affecting functionality; Clark et al., 1984). This is strongly implied by the recovery of clusters of highly polymorphic sites near A. syriaca positions 4,440 and 5,020, which are directly within the 26S expansion regions seven and eight, respectively (Kolosha & Fodor, 1990; Fig. 1). This agrees with results found in Drosophila is monomorphic in the hybrid, while the positions with minor allele frequencies >0.3 are in positions not shown to differ between A. syriaca and A. speciosa. This likely indicates ancestry from an unsampled nrDNA haplotype.
Individuals of the same species with dramatically different polymorphism profiles may indicate the presence of cryptic diversity. Differing demographic histories between populations within a species may lead to individuals from those populations possessing different levels of intragenomic polymorphisms. As with the interspecific hybrids discussed above, early generation hybrids between populations within a species could also possess inflated levels of intragenomic polymorphisms.

CONCLUSIONS
Nuclear ribosomal DNA copies within individuals of Asclepias are not identical, with intragenomic polymorphisms present at a higher rate than reported for other organisms. Polymorphism frequencies across the genus vary by more than an order of magnitude and demonstrate strong phylogenetic signal. Stem positions of ribosomal subunits are more likely to be polymorphic than loop positions. Distribution of polymorphic sites across the nrDNA cistron are consistent with strong selection on nrDNA subunits, with polymorphic sites being more frequent in the spacer regions, and this difference being amplified for sites that are highly polymorphic. These results reinforce the need for caution when using nrDNA for phylogenetic inference, especially when using the spacer regions or for applications requiring the precise estimates of branch lengths or divergence times.