Fine-scale recombination landscapes between a freshwater and marine population of threespine stickleback fish

Meiotic recombination is a highly conserved process that has profound effects on genome evolution. Recombination rates can vary drastically at a fine-scale across genomes and often localize to small recombination “hotspots” with highly elevated rates surrounded by regions with little recombination. Hotspot targeting to specific genomic locations is variable across species. In some mammals, hotspots have divergent landscapes between closely related species which is directed by the binding of the rapidly evolving protein, PRDM9. In many species outside of mammals, hotspots are generally conserved and tend to localize to regions with open chromatin such as transcription start sites. It remains unclear if the location of recombination hotspots diverge in taxa outside of mammals. Threespine stickleback fish (Gasterosteus aculeatus) are an excellent model to examine the evolution of recombination over short evolutionary timescales. Using an LD-based approach, we found recombination rates varied at a fine-scale across the genome, with many regions organized into narrow hotspots. Hotspots had divergent landscapes between stickleback populations, where only ~15% were shared, though part of this divergence could be due to demographic history. Additionally, we did not detect a strong association of PRDM9 with recombination hotspots in threespine stickleback fish. Our results suggest fine-scale recombination rates may be diverging between closely related populations of threespine stickleback fish and argue for additional molecular characterization to verify the extent of the divergence.


Introduction 63
Meiotic recombination is a highly-conserved process across a broad range of taxa (de 64 Massy 2013;Petes 2001). Recombination creates new allelic combinations by breaking apart haplotypes (Coop and Przeworski 2007;Otto and Lenormand 2002), promotes the proper 66 segregation of chromosomes during meiosis in many species (Davis and Smith 2001;Fledel-67 Alon et al. 2009;Kaback et al. 1992;Mather 1936), and has a pronounced impact on the 68 evolution of genomes (Mugal et al. 2015;Webster and Hurst 2012). In many species, meiotic 69 recombination occurs in small 1-2 kb regions called recombination "hotspots" which are 70 surrounded by large genomic regions with little to no recombination (Barton et al. 2008;Baudat 71 et al. 2010;Hellsten et al. 2013;Jeffreys et al. 1998;McVean et al. 2004;Myers et al. 2005;72 Steiner et al. 2002). 73 In most species, hotspot location is highly conserved over long evolutionary timescales 74 (Kawakami et al. 2017;Lam and Keeney 2015;Singhal et al. 2015;Tsai et al. 2010). For 75 example, finches share upwards of 73% of hotspots across 3 million years of evolution (Singhal 76 et al. 2015) while species of Saccharomyces share 80% of hotspots over 15 million years of 77 evolution (Lam and Keeney 2015). Evolutionarily conserved hotspots are often localized around 78 regions of open chromatin such as transcription start sites (TSSs) and CG-rich regions (i.e. CpG 79 islands) in vertebrates (Auton et al. 2013;Kawakami et al. 2017;Lee et al. 2004;Pan et al. 2011;80 Pokholok et al. 2005;Singhal et al. 2015;Tischfield and Keeney 2012). This localization pattern 81 is thought to be due the opportunistic nature of Spo11, a meiosis specific protein which initiates 82 recombination by creating double stranded breaks at regions of open chromatin (Celerin et al. 83 2000;Ohta et al. 1994;Pan et al. 2011). 84 A notable exception to strong conservation of recombination hotspots has been 85 documented in mammals, where hotspot location evolves rapidly between closely related species 86 or even between populations (Baker et al. 2015;Brick et al. 2012;Pratto et al. 2014;Smagulova 87 et al. 2016;Stevison et al. 2015). Contrary to the pattern observed in conserved systems, rapidly 88 evolving hotspots typically form away from functional genomic elements and are localized by 89 the zinc finger histone methyltransferase protein, PRDM9 (Baker et al. 2015;Baudat et al. 2010;90 Billings et al. 2013;Brick et al. 2012;McVean et al. 2004;Myers et al. 2005;Myers et al. 2010;91 Myers et al. 2008;Parvanov et al. 2010;Powers et al. 2016;Pratto et al. 2014). PRDM9 contains 92 multiple DNA-binding zinc fingers that are under strong positive selection, leading to divergent 93 hotspot localization between closely related species (Baker et al. 2015;Billings et al. 2013;Brick 94 et al. 2012;Myers et al. 2010;Parvanov et al. 2010). Though rapidly evolving hotspots have only 95 been documented in some mammals, positive selection is acting on the zinc finger domain of 96 PRDM9 orthologs in many non-mammalian species (Baker et al. 2017;Oliver et al. 2009). This 97 raises the intriguing possibility that some species outside of mammals may also have rapidly 98 evolving hotspots. It is also possible that PRDM9 is not necessary for rapid evolution of hotspots 99 in other species and that other mechanisms could lead to the evolution of fine-scale rates of 100 recombination over short timescales. 101 Threespine stickleback fish (Gasterosteus aculeatus) are an excellent system to study the 102 evolution of fine-scale recombination rates. Multiple populations of threespine stickleback fish 103 have independently adapted to freshwater environments from marine ancestors in the last 10-15 104 thousand years (Bell and Foster 1994;Orti et al. 1994), providing the opportunity to study the 105 parallel evolution of hotspots in well-characterized populations across the Northern Hemisphere 106 (Bell and Foster 1994;Ostlund-Nilsson et al. 2007;Wootton 1976). Broad-scale recombination 107 rates have been examined in threespine stickleback using genetic crosses (Glazer et al. 2015;108 Peichel et al. 2001;Roesti et al. 2013;Sardell et al. 2018), but fine-scale recombination rates 109 have not been estimated due to low marker density. 110 Fine-scale recombination rates can be estimated through a variety of approaches. 111 Recombination rates can be directly measured through genetic linkage maps (Broman et al. 112 1998;Campbell et al. 2016;Drouaud et al. 2006;Marand et al. 2017) or though sperm 113 genotyping (Baudat and de Massy 2007;Guillon and de Massy 2002;Jeffreys et al. 2001). Both 114 methods require a large number of progeny or sperm and a high density of genetic markers to 115 capture a sufficient number of crossovers. Recombination rates can also be indirectly measured 116 by identifying the binding sites of proteins that initiate double strand breaks (Pratto et al. 2014;117 Smagulova et al. 2011) as well as repair double strand breaks through homologous 118 recombination (Dumont and Payseur 2011;Froenicke et al. 2002). Another broadly used 119 approach estimates recombination rates from patterns of linkage disequilibrium (LD) in 120 populations, providing a historical measure of meiotic crossovers over multiple generations 121 (Chan et al. 2012;McVean et al. 2004;Myers et al. 2005;Wall and Stevison 2016). LD-based 122 methods are able to estimate rates at a fine-scale, but rate estimation can be biased by 123 demographic history (e.g. bottlenecks, population expansions, population sub-structure, etc.) 124 individuals in a population were also removed. After filtering, the Lake Washington population 187 had 5,054,729 SNPs genome-wide (11 SNPs/kb) and the Puget Sound population had 4,142,876 188 SNPs (9 SNPS/kb) genome-wide (prior to filtering Lake Washington had 11,937,220 SNPs and 189 Puget Sound had 11,070,421 SNPs). For the outgroup species, P. pungitius and G. wheatlandi, 190 low-quality SNPs were excluded by removing variants with a genotyping quality score less than 191 30 or a read depth less than two, resulting in 13,691,521 SNPs genome-wide in G. wheatlandi 192 (16,783,618 SNPs prior to filtering) and 7,791,420 in P. pungitius (26,173,287 SNPs prior to 193 filtering). 194

Haplotype phasing 196
Each chromosome was phased independently with SHAPEIT (v2.r837), a read-aware 197 phasing tool (Delaneau et al. 2013). Phase-informative reads with two heterozygous SNPs on the 198 same read were identified to assist with the estimation of haplotypes. Phase-informative reads 199 had a mapping quality score greater than 20. Convergence of the MCMC algorithm was 200 estimated by examining switch error rates between individual runs. A low switch error rate 201 would indicate that the MCMC phasing runs have converged on a similar haplotype 202 configuration. Switch error was measured using vcftools (v0.1.12b) using -diff-switch-error 203 (Danecek et al. 2011). A low switch error was achieved within a reasonable run time with the 204 following SHAPEIT parameters: --main 2000 --burn 200 --prune 210 --states 1000 (average 205 switch error between phasing runs: 0.824% for Lake Washington and 1.26% for Puget Sound). 206 All other parameters were left at the default values. 207 208

Estimation of recombination rates 209
Recombination rates were estimated with LDHelmet (v1.7) (Chan et al. 2012). LDHelmet 210 estimates historical recombination rates from population data by analyzing patterns of linkage 211 disequilibrium across phased individuals. The ancestral allele state was defined for every SNP in 212 each threespine stickleback population by comparing to the allele present in the two outgroup 213 species. An ancestral allele state could not be assigned if a polymorphism was segregating 214 among the outgroup species. Therefore, SNPs were only assigned an ancestral state if P. 215 pungitius and G. wheatlandi were homozygous for the same allele. The ancestral allele was 216 assumed to be the nucleotide carried by P. pungitius and G. wheatlandi, and was assigned a prior 217 probability of 0.91. To allow for uncertainty in the ancestral allele state, the other three possible 218 nucleotides were assigned prior probabilities of 0.03. If the ancestral allele state could not be 219 inferred, the prior probability of each nucleotide being the ancestral allele was set as the overall 220 frequency of that particular nucleotide on the chromosome. Nucleotide frequencies were 221 empirically determined from all sites on a threespine stickleback chromosome where P. 222 pungitius and G. wheatlandi had read coverage that passed the filtering scheme. Mutation 223 matrices were estimated for each population separately. For every position where an ancestral 224 allele state could be inferred, the total number of each type of mutation away from the ancestral 225 allele was quantified. A normalized 4x4 mutation matrix was generated for each chromosome as 226 previously described (Chan et al. 2012). The ancestral allele state and mutation matrices were 227 generated using a custom Perl script. 228 Each LDHelmet module was run using the following parameters. Custom Python scripts 229 were used to create the SNP sequence and SNP position input files. Full FASTA sequence were 230 created using vcf2fasta from vcflib (available at https://github.com/vcflib/vcflib). Haplotype Population-scaled recombination rates were compared with recombination rates estimated 253 from a high-density genetic linkage map (Glazer et al. 2015). Recombination rates from 254 LDHelmet were converted from /bp to cM/Mb as previously described (Smukowski Heil et al. 255 2015). Briefly, the recombination rate (cM/Mb) was calculated between every pair of adjacent 256 markers in the genetic map and a chromosome-wide recombination rate was calculated as the

Identification of recombination hotspots 265
Recombination hotspots were defined using a sliding window approach. In each window, 266 the average recombination rate within a 2 kb window was compared to the average 267 recombination rate from the 40 kb regions flanking either side of the 2 kb window. Hotspots 268 were defined as the 2 kb regions that had a 5-fold or higher recombination rate relative to the 269 mean recombination rate in the flanking background regions. The 2 kb windows iterated forward 270 in 1 kb increments. If multiple hotspots were found within a 5 kb region, only the hotspot with 271 the highest rate was retained. Misassemblies in the reference genome could generate false 272 hotspots. To limit this, all hotspots that spanned a contig boundary in the reference genome were 273 removed (384 hotspots out of 4,349 total hotspots). Hotspots were considered shared between 274 populations if the midpoints of the two hotspots were within 3 kb of each other. Random 275 permutations were used to calculate the expected amount of hotspot overlap between Lake 276 Washington and Puget Sound. 10,000 random permutations were drawn from the genome 277 totaling the number of 2 kb hotspots for each population. Recombination hotspots were identified 278 and filtered using custom Perl and Python scripts. 279

Genetic variation within and between populations 281
Within population nucleotide diversity () and Tajima's D were calculated separately for 282 each chromosome. To capture rare variants, previously excluded singletons were included in the 283 analysis. Nucleotide diversity and Tajima's D were calculated using the R package, PopGenome 284 (Pfeifer et al. 2014) and a custom Python script. Nucleotide diversity was calculated 285 between populations by combining SNP variants among all individuals in each population. 286 Population structure was estimated between Lake Washington and Puget Sound using 287 FastStructure (v1.0) (Raj et al. 2014). For this analysis, SNPs from Lake Washington and Puget 288 Sound were merged using vcftools (Danecek et al. 2011) and only biallelic sites with no missing 289 data were retained. The sex chromosomes (chromosome 19) were also excluded. The final SNP 290 dataset was composed of 4,113,937 SNPs. Three trials were completed at K values of 1, 2, and 3. 291 These K values were chosen to differentiate scenarios where Lake Washington and Puget Sound 292 were one panmictic population (K=1) or Lake Washington and Puget Sound were two distinct 293 populations (K=2). A K of 3 was chosen to identify any hidden population structure within either 294 population. The model that best explained the population structure was determined using 295 chooseK.py (Raj et al. 2014) and the structure plot was visualized using distructK.py (Raj et al. 296 2014). 297 298

Estimation of demographic history 299
Demographic history can affect LD-based estimates of recombination rates (Dapper and 300 Payseur 2017;Johnston and Cutler 2012). To determine whether the demographic history of 301 threespine stickleback fish could influence the ability to detect recombination hotspots, hotspots 302 were assayed in simulated haplotypes with known recombination profiles and demographic 303 histories. Demographic histories used in the simulations were based on the estimated histories of 304 Lake Washington and Puget Sound, modeled using a Pairwise Sequentially Markovian 305 Coalescent (PSMC) process with default parameters (Li and Durbin 2011;Liu and Hansen 306 2017). PSMC was run on one female from Lake Washington and one female from Puget Sound. 307 Confidence intervals were estimated on 100 bootstrap replicates. Demographic histories were 308 visualized using psmc_plot.pl (Li and Durbin 2011). 309

Simulations using estimated demographic histories 311
Using the demographic histories estimated with PSMC, 250 kb haplotypes with four 2 kb 312 recombination hotspots were simulated using the program fin, part of the LDHat software 313 package (Auton and McVean 2007;McVean et al. 2004). The hotspots were place 50 kb apart at 314 75, 125, 175, and 225 kb. The background recombination rate was set at 0.03 /kb. Hotspots had 315 varied intensities from 2 to 20 times the background rate, set at 0.06 /kb, 0.15 /kb, 0.3 /kb, 316 and 0.6 /kb. One scenario simulated a constant effective population size, with 500 sequences, 317 40 haplotypes each, with an average Watterson's  of 0.00355, the average between Lake 318 Washington and Puget Sound (-nsamp 40 -len 250000 -theta 0.00355). For both populations, a 319 bottleneck was simulated 8,000 generations ago (Puget Sound: t = 0.029, theta = 0.0036; Lake 320 Washington: t = 0.022, theta = 0.0035). Two bottleneck strengths were simulated by setting the 321 probability that a lineage coalesces to 10% or 90% (s = 0.1, 0.9). Overall, hotspot sharing 322 between simulated Lake Washington and simulated Puget Sound populations was quantified by 323 examining all pairwise comparisons between populations and bottleneck strengths. The first 324 hotspot simulated should not be called using our method as it falls below our cutoff, but can 325 provide information about how hotspots that fall below our cutoff affect hotspot calling. The 326 number of false positive and false negative hotspots were calculated using custom Python scripts. 327 328

Location of hotspots around transcription start sites 329
Transcript annotations from Ensembl (build 90) were lifted to the revised threespine 330 stickleback genome assembly (Glazer et al. 2015) by aligning each transcript using BLAT (v36, 331 default parameters) (Kent 2002). Aligned transcripts were only retained if the entire transcript 332 aligned to the revised genome assembly. Transcript start sites (TSSs) consisted of a 2 kb region, 333 centered at the start position of the transcript. A hotspot was considered overlapping with a TSS 334 if the midpoint of the hotspot overlapped with any part of a 2 kb TSS region. Enrichment of 335 hotspots in TSSs were compared against 10,000 random permutations. 2 kb regions were 336 randomly drawn across the genome, totaling the number of hotspots identified in each 337 population. TSS annotation filtering, overlap of hotspots with TSSs, and random permutations 338 were completed using custom Python scripts. 339

GC-Biased Substitutions 341
GC to AT and AT to GC substitutions were quantified within 2 kb regions of the genome 342 that had recombination rates in the top and bottom 5% as well as within all 2 kb recombination 343 hotspots. The top 5% of recombination rates captures regions of the genome that may broadly 344 have high recombination rates and not contain recombination hotspots. The top 5% of 345 recombination rates includes 96 hotspots for Lake Washington (out of 1,627 hotspots) and 314 346 hotspots for Puget Sound (out of 2,338 hotspots). The equilibrium GC content was calculated as 347 the proportion of AT to GC substitutions out of the total pool of substitutions (AT to GC and GC 348 to AT) (Meunier and Duret 2004;Singhal et al. 2015;Sueoka 1962). To increase the total 349 number of sites available for the analysis, the ancestral allele state was inferred using only G. matched coldspots (Bailey and Eklan 1994). Each hotspot was matched to a randomly selected 2 357 kb coldspot, which was located at least 25 kb from any identified hotspot, contained a GC 358 nucleotide content that was within 2% of the hotspot after removing ancestral CpG sites (GC-359 matched), and had a mean recombination rate that was less than half the background 360 recombination rate of the population (Lake Washington: less than 0.017 /bp; Puget Sound: less 361 than 0.035 /bp). MEME ignored motif occurrences if they were present in a hotspot multiple 362 times (-mod zoops). This was to prevent the reporting of repetitive motifs. MEME was run 363 separately for each chromosome and population and was completed when 50 motifs were 364 identified (-nmotifs 50). Motif identification was conducted separately for shared hotspots and 365 population-specific hotspots. 366 The DNA-binding protein, PRDM9, is important for localizing recombination hotspots in 367 mammals (Baker et al. 2015;Baudat et al. 2010;Billings et al. 2013;Brick et al. 2012;Myers et 368 al. 2010;Myers et al. 2008;Parvanov et al. 2010;Powers et al. 2016;Pratto et al. 2014). To 369 determine if any PRDM genes had a role in localizing hotspots in threespine stickleback fish, 370 FIMO (v4.11.0, default parameters) (Grant et al. 2011) was used to scan hotspot sequences for 371 the predicted DNA binding motifs for each of the 11 annotated PRDM genes in the threespine 372 stickleback genome (Ensembl, build 90). DNA binding motifs for each PRDM protein were 373 predicted using the Cys2His2 zinc finger prediction tool, Predicting DNA-binding Specificities 374 for the Cys2His2 Zinc Finger Proteins (Persikov et al. 2009;Persikov and Singh 2014). Predicted 375 zinc finger domains were included if the HMMER bit score for the zinc fingers was 17.7 or 376 higher (Persikov et al. 2009;Persikov and Singh 2014). To determine the expected number of 377 occurrences of a motif of the same length and GC composition in hotspots, the PRDM motifs 378 were shuffled 100 separate times. FIMO was run on the shuffled motifs to create a null 379 distribution. Motifs were shuffled using a custom python script. 380 Results 382

Genetic differentiation between Lake Washington and Puget Sound 383
Freshwater populations of threespine stickleback fish frequently exhibit signs of past 384 bottlenecks, consistent with their colonization from marine ancestors ~10-15 thousand years ago 385 (Bell and Foster 1994;Ferchaud and Hansen 2016;Hohenlohe et al. 2010;Liu et al. 2016). 386 Given the recent divergence and the close geographic proximity between Lake Washington 387 (freshwater) and Puget Sound (marine), we first examined whether these two populations were 388 genetically distinct. Using FastStructure, a two population model was the most highly supported 389 Sound has a larger effective population size than Lake Washington, matching the expected 406 pattern of marine populations having larger effective population sizes than freshwater 407 populations (DeFaveri and Merila 2015; Gow et al. 2006;Makinen et al. 2006). 408 409

Fine-scale estimation of recombination rates across the genome 410
Using a dense set of SNP markers from whole-genome sequencing, we estimated 411 recombination rates across the genomes of Lake Washington and Puget Sound threespine 412 stickleback fish. The average genome-wide population recombination rate in Lake Washington 413 was half of the rate observed in Puget Sound (Lake Washington: 0.035 /bp; Puget Sound: 0.072 414 /bp; Wilcoxon Rank Test; p < 0.001, Supplemental Table 1). Despite having an overall lower 415 genome-wide recombination rate in Lake Washington, recombination rates were largely 416 conserved at broad scales between the two populations. We observed a highly significant  (Barton et al. 2008;Berner and Roesti 2017;Broman et 427 al. 1998;See et al. 2006). 428 To determine whether the broad-scale recombination rates we estimated from LD-based 429 methods are concordant with recombination rates measured from linkage mapping, we compared 430 the rates from Lake Washington and Puget Sound with the rates estimated from a genetic linkage 431 map from a freshwater female and a marine male (Glazer et al. 2015). We found a significant 432 positive correlation between recombination rates in both populations and the linkage map 433 (Spearman's Rank Correlation; Lake Washington: r = 0.830, p < 0.001; Puget Sound: r = 0.810, 434 p < 0.001; Figure 3). These data indicate that broad-scale changes are conserved across multiple 435 populations of threespine stickleback fish and confirm that the recombination rates estimated 436 from LD-based methods largely parallel the rates observed from genetic linkage maps. Although 437 broad-scale (Mb) recombination rates tend to be conserved over longer evolutionary timescales 438 (Fledel-Alon et al. 2009;Kong et al. 2002;Serre et al. 2005;Stevison et al. 2015), fine-scale (kb) 439 rates within chromosomes can rapidly evolve (Barton et al. 2008;Hellsten et al. 2013;McVean 440 et al. 2004;Myers et al. 2005). In many organisms, recombination is organized locally into 441 narrow regions of very high rates (i.e. "hotspots"), surrounded by regions of little to no 442 recombination (i.e. "coldspots") (Baudat et al. 2010;Jeffreys et al. 1998;Steiner et al. 2002).

Divergent hotspot locations between populations of threespine sticklebacks 448
Using a sliding-window approach, we identified 2,338 hotspots in Puget Sound and 1,627 449 hotspots in Lake Washington. Strikingly, only 312 of these hotspots were shared between 450 populations (13.3% of hotspots in Puget Sound and 19.2% of hotspots in Lake Washington). 451 This lack of hotspot overlap between Lake Washington and Puget Sound may, in part, be due to 452 hotspots falling just below the hotspot threshold. To investigate this, we looked for any increase 453 in recombination rate in locations where hotspots were present in one population, but absent in 454 the other. We found little evidence of a localized increase in recombination rate in these regions. 455 Recombination rates were close to the background rate in the population where hotspots were 456 deemed absent (Figures 5A and 5B). This pattern was even more apparent when shared hotspots 457 were removed from the analysis (Figures 5C and 5D). The small degree of overlap we observed 458 in hotspots between the populations was much greater than what would be expected from chance 459 alone (10,000 random permutations; p < 0.001; Supplemental Figure 5), indicating much of the 460 hotspot overlap likely represents shared ancestry. 461

Increased recombination rate in the pseudoautosomal region 463
Genetic recombination between sex chromosomes is restricted to the pseudoautosomal 464 region (PAR), where rates of recombination can be orders of magnitude above genome-wide 465 averages (Otto et al. 2011;Wright et al. 2016). In threespine stickleback, crossing over between 466 the X and Y chromosomes is restricted to a ~2.5 Mb PAR (Peichel et al. 2004;Roesti et al. 2013;467 White et al. 2015;Yoshida et al. 2014). Because of the potential for high rates of crossing over in 468 the PAR, we estimated population-scaled recombination rates for this region independently from 469 the autosomes. The average recombination rate in the PAR was 0.232 /bp for Puget Sound and 470 0.129 /bp for Lake Washington. These rates were significantly higher than the average 471 recombination rate across the autosomes (Lake Washington autosome average rate: 0.035 /bp, p 472 <0.001; Puget Sound autosome average rate: 0.072 /bp, p < 0.001; Figure 6). Although we 473 observed some fine-scale variation in recombination rates across the PAR (Figure 6), we 474 identified very few hotspots, which may be due to the increased background recombination rate 475 across the PAR. 476 477

Demographic history may not completely account for hotspot divergence 478
To explore how changes in past effective population size (Ne) may have affected our 479 ability to detect hotspots, we simulated haplotypes with known demographic histories that 480 followed the demographic histories we estimated from Lake Washington and Puget Sound, along  Table 2 Based on the demographic histories we estimated, Lake Washington experienced a less 502 intense bottleneck than Puget Sound. We therefore also used simulations to explore the expected 503 hotspot overlap if only one of the populations experienced a strong bottleneck. If Puget Sound 504 experienced a strong bottleneck and Lake Washington experienced a weak bottleneck, 36.7% of 505 hotspots were shared in the simulated Lake Washington population and 20.5% of hotspots were 506 shared in the simulated Puget Sound population (actual Lake Washington shared hotspots: 507 19.2%, actual Puget Sound shared hotspots: 13.3%). Except for a scenario where both 508 populations underwent a severe bottleneck in the past, our simulations suggest that demographic 509 history alone is not sufficient to completely explain the divergence we observed in hotspot 510 location between populations. 511 512

Hotspots are enriched around transcription start sites 513
Hotspot localization in genomes varies among taxa. In yeast, birds, and some plants, 514 where hotspots are evolutionarily conserved, hotspots tend to be enriched within transcription 515 start sites (Kawakami et al. 2017;Pan et al. 2011;Singhal et al. 2015;Tischfield and Keeney 516 2012). In mammals with rapidly evolving hotspots, hotspots are typically located away from 517 genic regions (Brick et al. 2012;Brunschwig et al. 2012;Myers et al. 2005). We investigated 518 whether threespine stickleback fish hotspots mimic either of the patterns seen in other systems. 519 We found an enrichment of hotspots around TSSs, compared to random permutations of hotspots 520 (Lake Washington: 26% of hotspots fell within 3 kb of a TSS, p < 0.034; Puget Sound: 29% of 521 hotspots fell within 3 kb of a TSS, p < 0.001; Supplemental Figure 6). This pattern also held 522 when examining only population specific hotspots (Lake Washington: p = 0.007; Puget Sound: p 523 < 0.001; Supplemental Figure 7); however, shared hotspots were not enriched in TSSs compared 524 to random permutations (Lake Washington: p = 0.370; Puget Sound: p = 0.827; Supplemental 525 Figure 6). The lack of significant enrichment of shared hotspots around TSSs is likely due to the 526 small sample size. When we randomly drew samples from the population-specific hotspots that 527 were equal in size to the shared hotspot pools, there was no longer enrichment around TSSs 528 (Lake Washington: p = 0.947; Puget Sound: p = 0.808). 529

Regions of high recombination exhibit GC-biased nucleotide substitution 531
Recombination leaves distinct signatures of nucleotide substitution across the genome 532 (Duret and Arndt 2008;Mugal et al. 2015;Webster and Hurst 2012). Over time, the repair of 533 heteroduplex DNA during meiosis favors the substitution of GC nucleotides over AT 534 nucleotides, which increases the frequency of GC nucleotides, leading to GC-biased base 535 composition (Lesecque et al. 2013;Marais 2003;Meunier and Duret 2004). Regions of the 536 genome with higher recombination rates tend to have higher GC-biased base composition 537 (Kawakami et al. 2017;Kong et al. 2002;Meunier and Duret 2004;Singhal et al. 2015). To 538 determine whether regions of higher recombination rate showed signatures of GC-biased gene 539 conversion, we calculated equilibrium GC content (Meunier and Duret 2004;Singhal et al. 2015;540 Sueoka 1962) in regions of the genome with the highest and lowest recombination rates (top and 541 bottom 5%) as well as within recombination hotspots. 542 In both Lake Washington and Puget Sound, we detected a significantly higher 543 equilibrium GC content in regions of the genome with a high recombination rate (top 5% of 544 recombination rates among 2 kb windows) compared to regions of the genome with 545 recombination rates in the bottom 5% (Table 1). Overall, these results indicate GC nucleotide 546 composition is influenced by the historical recombination landscape across the threespine 547 stickleback genome. Interestingly, although hotspots in both populations have locally elevated 548 recombination rates, there was not a parallel increase in equilibrium GC content. Equilibrium GC 549 content in population-specific hotspots and shared hotspots was not significantly different than 550 regions of the genome with the lowest recombination rates (bottom 5%) (Table 1). Our results 551 are consistent with recombination hotspots being more recently derived, where locally increased 552 recombination rates have not yet had an effect on GC-biased nucleotide substitution. 553 554 PRDM genes are weakly associated with threespine stickleback recombination hotspots 555 Hotspots in many species are targeted to specific regions of the genome by DNA binding 556 motifs (Baudat et al. 2010;Kon et al. 1997;Myers et al. 2008;Steiner et al. 2002). In species 557 where PRDM9 targets recombination hotspots to specific regions of the genome, the zinc finger 558 domain of PRDM9 is typically under strong positive selection (Baker et al. 2015;Baudat et al. 559 2010;Billings et al. 2013;Myers et al. 2010;Oliver et al. 2009;Pratto et al. 2014) and the 560 protein contains functional KRAB and SSXRD domains (Baker et al. 2017). In Teleost fish, two 561 paralogs of PRDM9 have been identified, PRDM9 which contains all the protein domains and 562 PRDM9 which lacks the KRAB and SSXRD domains (Baker et al. 2017). Threespine 563 stickleback fish appear to have lost PRDM9, but retain PRDM9 without the SSXRD and 564 KRAB domains. Consistent with a lack of function directing recombination hotspots, we did not 565 observe strong signatures of positive selection in the zinc finger domain of PRDM9. We found 566 zero fixed differences between threespine and blackspotted stickleback for the PRDM9 ortholog. 567 There was one synonymous and one nonsynonymous mutation at moderate frequency in Lake 568 Washington and two synonymous and three nonsynonymous mutations at moderate frequency in 569 Puget Sound, indicating these mutations are likely not causing the population-specific 570 localization of hotspots we observed between Lake Washington and Puget Sound. 571 We also examined whether the predicted binding sites of any of the 11 previously 572 annotated PRDM genes in threespine stickleback fish were enriched in recombination hotspots. 573 Less than 14% of hotspots contained any of the predicted PRDM zinc finger binding domain 574 motifs (Supplemental Table 3). However, six of the motifs were significantly enriched in 575 hotspots, including PRDM9, when compared to scrambled motifs of the same size and GC 576 content (Supplemental Table 3), indicating PRDM genes could have some role in localizing a 577 subset of recombination hotspots. Outside of PRDM9 in mammals, multiple DNA binding motifs 578 assist with hotspot targeting in other systems such as Schizosaccharomyces pombe (Kon et al. 579 1997;Steiner et al. 2002). To see if other DNA motifs were targeting hotspots in threespine 580 stickleback fish, we searched for motifs enriched in hotspots. The most significant motifs 581 identified were simple mono-or di-nucleotide repeats which were present only in a subset of the 582 hotspots (Supplemental Figure 8). These repeats were not specific to hotspots as they were also 583 found in GC-matched coldspots. 584 585 Discussion 586

Broad-scale recombination rates across the threespine stickleback genome 587
At a broad scale, recombination rates across the threespine stickleback genome were 588 conserved between the two populations. This broad scale conservation of recombination rates is 589 a feature observed in many taxa (Fledel-Alon et al. 2009;Kong et al. 2002;Serre et al. 2005;590 Stevison et al. 2015) and may reflect the necessity of crossing over for the proper segregation of 591 chromosomes during meiosis (Davis and Smith 2001;Fledel-Alon et al. 2009;Kaback et al. 592 1992;Mather 1936). Additionally, we observed differential rates of recombination associated 593 with broad genomic regions that have been observed in other systems. First, we observed higher 594 recombination rates towards the telomeres. In many species, the ends of chromosomes have 595 higher rates of recombination (Barton et al. 2008;Berner and Roesti 2017;Kong et al. 2002;596 Roesti et al. 2013;Sardell et al. 2018), which is thought to be driven by male-specific 597 localization of recombination (Broman et al. 1998;Moen et al. 2008;Singer et al. 2001). Our 598 LD-based method estimates sex-averaged recombination rates, which does not allow us to test 599 whether the pattern we observed around the ends of chromosomes is driven by males. However, 600 sex-specific genetic linkage maps between the Japan Sea stickleback (Gasterosteus nipponicus) 601 and the threespine stickleback (G. aculeatus) corroborate this pattern (Sardell et al. 2018). 602 Second, we observed higher recombination rates in the pseudoautosomal region compared to the 603 autosomes. Recombination rates in pseudoautosomal regions are often orders of magnitude 604 above autosome-wide averages, as an obligate crossover should occur between the X and Y 605 chromosomes in these small regions during every male meiosis (Hinch et al. 2014;Kauppi et al. 606 2012;Otto et al. 2011). 607 Overall, the genome-wide average recombination rate for Puget Sound was two-fold 608 higher than in Lake Washington. Rate variation between populations or species can be driven by 609 a number of processes. Structural variation (i.e. inversions, chromosomal rearrangements, and 610 copy number variants) can contribute to rate variation among genomes. Indeed, recombination 611 rates have been shown to vary across chromosomal regions due to segregating inversions 612 between marine and freshwater populations of threespine stickleback (Glazer et al. 2015;Jones 613 et al. 2012). Although structural variants could explain rate differences between Lake 614 Washington and Puget Sound populations at a more localized level, they cannot explain the 615 genome-wide rate differences we observed. Over longer evolutionary timescales, recombination 616 rate also can evolve neutrally (Dumont and Payseur 2008), driving genome-wide rate variation 617 between species. However, neutral divergence is likely not occurring at a pace that would alter 618 genome-wide recombination rates between recently diverged populations of threespine 619 stickleback fish. One plausible explanation for the observed rate differences is differences in 620 demographic history between the Lake Washington and Puget Sound populations. A larger 621 effective population size could increase the population-scaled recombination rate (Burt 2000;622 Charlesworth 2009). In threespine stickleback, marine populations typically have a larger Ne 623 than freshwater populations (DeFaveri and Merila 2015; Gow et al. 2006;Makinen et al. 2006), 624 consistent with our observed pattern of a higher recombination rate in Puget Sound relative to 625 Lake Washington. 626 627 Identifying hotspots using patterns of linkage disequilibrium 628 LD-based estimates of recombination rates can be affected by demographic processes 629 that change patterns of linkage disequilibrium across the genome (Chan et al. 2012;Dapper and 630 Payseur 2017;Johnston and Cutler 2012;McVean et al. 2004;Wall and Stevison 2016). The 631 duration and timing of these events can have varying effects on hotspot identification, often 632 reducing the power to detect hotspots and increasing the rate of errors (Dapper and Payseur 633 2017). Threespine stickleback fish have a complex history of bottleneck events and population 634 expansions over the last 10-15 thousand years which vary across geographic regions (Bell and 635 Foster 1994;Ferchaud and Hansen 2016;Hohenlohe et al. 2010;Liu et al. 2016;Orti et al. 636 1994). Based on simulations, demographic history likely has some role in the observed 637 divergence in hotspot location between Lake Washington and Puget Sound populations, but it 638 seems likely that population demography does not completely explain the pattern. Only in the 639 scenario where both populations experienced a strong bottleneck do error rates rise high enough 640 to mimic the observed divergence in hotspot location. However, our estimates of effective 641 population size over time revealed that Lake Washington and Puget Sound did not experience 642 similar fluctuations. Both populations began with effective population sizes that largely parallel 643 those observed in other threespine stickleback fish populations (Liu and Hansen 2017;Ravinet et 644 al. 2018). Puget Sound then experienced a larger population expansion roughly 18,000 years ago, 645 followed with a decrease in population size at approximately 8,000 years ago. Lake Washington 646 had a slight increase in population size, followed by a small bottleneck around the same time, but 647 overall changes in effective population size were more stable in this population. Examination of 648 where recombination hotspots are currently forming across the genome in Lake Washington and 649 Puget Sound would help confirm the patterns we observed. Surveys of double strand break 650 hotspots (Pratto et al. 2014;Smagulova et al. 2011) or crossover breakpoints in genetic crosses 651 (Broman et al. 1998;Campbell et al. 2016;Drouaud et al. 2006;Marand et al. 2017) would 652 reveal the degree to which recombination hotspots are targeted to different genomic locations in 653 these two populations. 654 655 Hotspot evolution in freshwater and marine threespine stickleback populations 656 Of the 3,965 hotspots between Lake Washington and Puget Sound, only ~15% of 657 hotspots are shared, indicating many of the hotspots are recently derived within populations of 658 threespine stickleback fish. Consistent with the recent evolution of hotspots, we did not observe 659 an elevated equilibrium GC content in these regions. One possible model is that recombination 660 hotspots can shift over short evolutionary timescales among regions of the genome that are 661 susceptible to homologous recombination, such as regions of accessible chromatin. Both 662 evolutionarily conserved and rapidly evolving hotspots tend to locate to regions of accessible 663 chromatin (Lam and Keeney 2015;Ohta et al. 1994;Pan et al. 2011;Tischfield and Keeney 664 2012) or regions with histone 3 lysine 4 trimethylation (H3K4me3) (Auton et al. 2013;Baker et 665 al. 2015;Marand et al. 2017;Smagulova et al. 2011). 666 In taxa where hotspots are evolutionarily conserved, hotspots are highly enriched around 667 TSSs (Auton et al. 2013;Kawakami et al. 2017;Pan et al. 2011;Singhal et al. 2015;Tischfield 668 and Keeney 2012).This pattern could be due to either higher selective constraints at TSSs or the 669 chromatin structure at TSSs. TSSs are often under purifying selection and if a genomic feature, 670 like a DNA motif, is targeting hotspots to these regions, these features would also be preserved 671 through purifying selection, maintaining the location of the hotspot (Kawakami et al. 2017;Lam 672 and Keeney 2015; Singhal et al. 2015;Tsai et al. 2010). On the other hand, an open chromatin 673 conformation could be driving this pattern. TSSs and the surrounding regions must be accessible 674 for transcription to occur while also providing sites for Spo11 to bind, initiating recombination 675 (Lee et al. 2004;Pokholok et al. 2005) as Spo11 will create double strand breaks at any sites with 676 accessible chromatin (Celerin et al. 2000;Ohta et al. 1994;Pan et al. 2011). In Lake Washington 677 and Puget Sound populations, we found some enrichment of hotspots at TSSs (Lake Washington: 678 26% of hotspots fell within 3 kb of a TSS; Puget Sound: 29% of hotspots fell within 3 kb of a 679 TSS) which is similar to hotspot enrichment around TSS in taxa that do not have a functional 680 PRDM9 protein. In birds and dogs, for example, ~20-30% of hotspots overlap with TSSs (Auton 681 et al. 2013;Kawakami et al. 2017;Singhal et al. 2015). Additional characterization is needed to 682 determine if hotspots in threespine stickleback are occurring in regions of the genome that are 683 already open due to transcription or if there is a mechanism that creates accessible chromatin 684 specifically for double strand break formation, like what is believed to occur with PRDM9 in 685 mammalian species (Diagouraga et al. 2018;Hayashi et al. 2005;Powers et al. 2016). 686 In some mammalian systems, positive selection acting on the zinc finger binding domain 687 of PRMD9 has led to multiple distinct DNA binding motifs between closely related species 688 (Baudat et al. 2010;Myers et al. 2010;Myers et al. 2008;Pratto et al. 2014). This leads to a rapid 689 evolution of hotspot localization (Baker et al. 2015;Brick et al. 2012;Pratto et al. 2014;690 Smagulova et al. 2016;Stevison et al. 2015). Typically, ~40% of hotspots will contain a PRDM9 691 motif in mouse and humans (Baudat et al. 2010;Myers et al. 2008      . LD-based recombination rates around hotspots are population-specific. Mean recombination rates are shown across a 40 kb interval, flanking the center of hotspots. The mean recombination rate in shared and population-specific Lake Washington hotspots is higher in the Lake Washington population compared to the homologous regions in the Puget Sound population (A). The mean recombination rate in shared and population-specific Puget Sound hotspots are higher in the Puget Sound population compared to the homologous regions in the Lake Washington population (B). The pattern is more pronounced when shared hotspots are removed from the comparison, leaving only the population-specific hotspots (C and D). Puget Sound is shown in red and Lake Washington is shown in blue.