The Aquilegia genome: adaptive radiation and an extraordinarily polymorphic chromosome with a unique history

The columbine genus Aquilegia is a classic example of an adaptive radiation, involving a wide variety of pollinators and habitats. Here we present the genome assembly of A. coerulea ‘Goldsmith’, complemented by high-coverage sequencing data from 10 wild species covering the world-wide distribution. Our analyses reveal extensive allele sharing among species and demonstrate that introgression and selection played a role in the Aquilegia radiation. We also present the remarkable discovery that the evolutionary history of an entire chromosome differs from that of the rest of the genome – a phenomenon which we do not fully understand, but which highlights the need to consider chromosomes in an evolutionary context.


28
Understanding adaptive radiation is a longstanding goal of evolutionary biology (Schluter, 2000). As 29 a classic example of adaptive radiation, the Aquilegia genus has outstanding potential as a subject  (Figure 1). Distributions of many Aquilegia species overlap or adjoin one another, sometimes 33 forming notable hybrid zones (Grant, 1952; Hodges and Arnold, 1994b; Li et al., 2014). Additionally, 34 species tend to be widely interfertile, especially within geographic regions (Taylor, 1967). 35 Phylogenetic studies have de ned two concurrent, yet contrasting, adaptive radiations in Aquile-   are very closely related. 44 Genomic data are beginning to uncover the extent to which interspeci c variant sharing re ects 45 a lack of strictly bifurcating species relationships, particularly in the case of adaptive radiation. 46 Discordance between gene and species trees has been widely observed (Novikova et  . 52 The importance of admixture as a source of adaptive genetic variation has also become more being a problem to overcome in phylogenetic analysis, non-bifurcating species relationships could 55 actually describe evolutionary processes that are fundamental to understanding speciation itself. 56 Here we generate an Aquilegia reference genome based on the horticultural cultivar Aquilegia

61
Genome assembly and annotation 62 We sequenced an inbred horticultural cultivar (A. coerulea 'Goldsmith') using a whole genome 63 shotgun sequencing strategy. A total of 171,589 Sanger sequencing reads from seven genomic Hodges, 2010), and protein homology support, yielding 30,023 loci and 13,527 alternate transcripts. 75 The A. coerulea v3.1 genome release is available on Phytozome (https://phytozome.jgi.doe.gov/). 76 For a detailed description of assembly and annotation, see Materials and Methods.

77
Polymorphism and divergence 78 We deeply resequenced one individual from each of ten Aquilegia species (Figure 1 and Figure   79 1-gure supplement 1). Sequences were aligned to the A. coerulea v3.1 reference using bwa-mem variants could be reliably called across all ten species (see Materials and Methods for alignment, 83 SNP calling, and genome ltration details). The resulting callable portion of the genome was heavily 84 biased towards genes and included 57% of annotated coding regions (48% of gene models), but 85 only 21% of the reference genome as a whole. 86 Using these callable sites, we calculated nucleotide diversity as the percentage of pairwise  98 We next considered nucleotide diversity between individuals as a measure of species divergence. 99 Species divergence within a geographic region (0.38-0.86%) was often only slightly higher than Discordance between gene and species trees 120 To assess discordance between gene and species (genome) trees, we constructed a cloudogram 121 of trees drawn from 100kb windows across the genome (Figure 3a).     In all species examined, the proportion of deeply shared variants was higher on chromosome 155 four (Figure 4d), largely due to a reduction in private variants, although sharing at other depths 156 was also reduced in some species. Variant sharing on chromosome four within Asia was higher in 157 both A. oxysepala and A. japonica (Figure 4b), primarily re ecting higher variant sharing between 158 these species (Figure 6a). Asian (without A. oxysepala) species as H1 and H2 (Figure 5a-c). As predicted, the North American 175 split was closest to resembling speciation with strict reproductive isolation, with little asymmetry 176 in allele sharing between North American and Asian species and low, but signi cant, asymmetry 177 between North American and European species (Figure 5b). Next, we considered allele sharing 178 between European and Asian (without A. oxysepala) species (Figure 5 d,

245
Gene models on the chromosome were also more likely to contain variants that could disrupt 246 protein function ( Table 2). Taken together, these observations suggest less purifying selection on 247 chromosome four. 248 Reduced purifying selection could also explain the putatively higher gene ow between A. oxy-  adjusted R-squared=3.373 ù 10 *5 ), but nucleotide diversity is consistently much higher for chro-      The 35S and 5S rDNA loci are uniquely localized to chromosome four 280 The observation that one Aquilegia chromosome is di erent from the others is not novel; previous 281 cytological work described a single nucleolar chromosome that appeared to be highly heterochro-282 matic (Linnert, 1961). Using  Pachytene chromosome spreads were probed with probes corresponding to oligoCh4 (red), 35S rDNA (yellow), 5S rDNA (green) and two (peri)centromeric tandem repeats (pink). Chromosomes were counterstained with DAPI. Scale bars = 10 µm.
14 of 31  303 the completeness and accuracy of our assembly (Supplementary File 4), as well as consistency 304 between reference and k-mer based estimates of genome size (Supplementary File 9), suggest that 305 this di erence is likely due to highly repetitive content, including the large rDNA loci on chromosome 306 four. 307 Variant sharing across the Aquilegia genus is widespread and deep, even across exceptionally 308 large geographical distances. Although much of this sharing is presumably due to stochastic 309 processes, as expected given the rapid time-scale of speciation, asymmetry of allele sharing demon-310 strates that the process of speciation has been reticulate throughout the genus, and that gene 311 ow has been a common feature. Aquilegia species diversity therefore appears to be an example  It is tempting to speculate that the distinct evolutionary history of chromosome four is con- The cDNAs that failed to align were checked against the NCBI nucleotide repository (nr), and a large 424 fraction were found to be arthropods (Acyrthosiphon pisum) and prokaryotes (Acidovorax).  (Haas et al., 2003). 444 17 of 31  (Yeh et al., 2001). 451 The nal gene set was selected from all predictions at a given locus based on evidence for EST 452 support or protein homology support according to several metrics, including Cscore, a protein 453 BLASTP score ratio to homology seed mutual best hit (MBH) BLASTP score, and protein coverage, 454 counted as the highest percentage of protein model aligned to the best of its Angiosperm homologs.   (Li and Durbin, 2009). We processed alignments with SAMtools (Li et al., 2009) 655 We also calculated the partial correlation coe cient between recombination rate and neutral 656 nucleotide diversity accounting for gene density as in (Corbett-Detig et al., 2015).  (Han et al., 2015). Brie y, oligonucleotides were multiplied in two independent ampli cation 681 steps. First, 0.2 ng DNA from the immortal library was ampli ed from originally ligated adaptors in 682 emulsion PCR using primers provided by MYcroarray together with the library. Emulsion PCR was 683 used to increase the representativeness of ampli ed products (Murgha et al., 2014). Droplets were 684 generated manually by stirring of oil phase at 1000 ◊ g at 4°C for 10 min and then the aqueous  Figure 8 were straightened using the "straighten-curved-objects" plugin in the Image J software 709 (Kocsis et al., 1991). counterstained with DAPI, uorescence signals analyzed and photographed as described above. 721 The slides were washed in 2◊ SSC (2 ◊ 5 min), dehydrated in an ethanol series (70%, 90%, and 100%, 722 2 min each), and rehybridized with 35S rDNA probe as described above.      The difference between each of the 1000 simulated and observed tree topology proportions is estimated using euclidean distance where observed and simulated proportions of sib-japon and japon-oxy topologies specify point 1 and point 2, respectively (sib-japon refers to red tree while japon-oxy refers to blue tree in Fig. 6a). Quantified as such, a difference of zero (y-axis) points at the best parameter combination that matches observed allele sharing proportions genomewide. Right: Given the "species" tree ( Fig. 6b), D-statistics (y-axis) are calculated for different combinations of m (x-axis) and N (legend; top left) which cover the parameter space of interest ( Fig. 6c and Supplementary Table 7