Deep sampling of Hawaiian Caenorhabditis elegans reveals high genetic diversity and admixture with global populations

Hawaiian isolates of the nematode species Caenorhabditis elegans have long been known to harbor genetic diversity greater than the rest of the worldwide population, but this observation was supported by only a small number of wild strains. To better characterize the niche and genetic diversity of Hawaiian C. elegans and other Caenorhabditis species, we sampled different substrates and niches across the Hawaiian islands. We identified hundreds of new Caenorhabditis strains from known species and a new species, Caenorhabditis oiwi. Hawaiian C. elegans are found in cooler climates at high elevations but are not associated with any specific substrate, as compared to other Caenorhabditis species. Surprisingly, admixture analysis revealed evidence of shared ancestry between some Hawaiian and non-Hawaiian C. elegans strains. We suggest that the deep diversity we observed in Hawaii might represent patterns of ancestral genetic diversity in the C. elegans species before human influence.

Hawaiian isotypes, some genomic regions appear to be shared with non-Hawaiian isotypes from around the 95 globe. These results provide the first evidence of gene flow between these populations and suggest that 96 future sampling efforts in the Hawaiian Islands will help elucidate the evolutionary processes that have 97 shaped the genetic diversity in the C. elegans species. 98 99 Results 100 101 Hawaiian nematode diversity 102 In August 2017, we collected a total of 2,263 samples across five Hawaiian islands and ascertained the 103 presence of nematodes in each sample (Figure 1, Supplemental Table 5). We isolated one or more 104 nematodes from 1,120 of 2,263 (48%) samples, and an additional 431 of 2,263 (19%) samples had 105 circumstantial evidence of nematodes (tracks but no nematodes could be found on the collection plate). 106 Altogether, we isolated 2,531 nematodes from 1,120 samples and genotyped them by analysis of the Internal 107 Transcribed Spacer (ITS2) region between the 5.8S and 28S rDNA genes (Barrière and Félix, 2014;Kiontke 108 et al., 2011). We refer to isolates where the ITS2 region was amplified by PCR as 'PCR-positive' and isolates 109 with no amplification as 'PCR-negative' (see Methods). The PCR-positive category comprises 110 Caenorhabditis isolates that we identified to the species level and isolates from genera other than 111 Caenorhabditis that we identified to the genus level. Using this categorization strategy, we found that 427 of 112 2,531 isolates (17%) were PCR-positive and belonged to 13 distinct taxa. Among all isolates, we identified 113 five Caenorhabditis species at different frequencies across the 2,263 samples: C. briggsae (4.2%), 114 C. elegans (1.7%), C. tropicalis (0.57%), C. kamaaina (0.088%), and a new species C. oiwi (0.53%) 115 (Supplemental Table 5). We named Caenorhabditis oiwi for the Hawaiian word meaning "native" in 116 reference to its endemic status on the Hawaiian Islands. This species was found to be distinct based on 117 molecular barcodes (Kiontke et al., 2011) and on biological species inference from mating crosses (Félix et 118 al., 2014) (Supplemental File 1). The most common Caenorhabditis species we isolated was C. briggsae, 119 which is consistent with nematode collection efforts by other groups that suggest C. briggsae is a ubiquitous 120 species in many regions of the world (Félix et al., 2013). We found no evidence of island enrichment for 121 Caenorhabditis species apart from C. elegans, where it was enriched on the Big Island relative to Kauai and 122 Maui (Fisher's Exact Test, p < 0.01). 123 The C. elegans niche is distinct from other Caenorhabditis species on Hawaii 135 To characterize more about a nematode niche on the Hawaiian Islands, we classified the substrate for each 136 distinct collection and measured various environmental parameters. Of the six major classes of substrate, 137 we found nematodes most often on leaf litter (56%). When we account for collections with nematode-like 138 tracks on the collection plate, we estimated that greater than 80% of leaf litter substrates contained 139 nematodes (Figure 2A). The isolation success rate for the other classes of substrate ranged from 35% to 140 48% (Figure 2A). In comparison to overall nematode isolation rates, Caenorhabditis nematodes were 141 isolated more frequently from flower substrates (40 of 202 collections) than any other substrate category 142 (Fisher's Exact Test, p < 0.02) (Figure 2A). We also found that Caenorhabditis nematodes were enriched 143 on rotting fruits, nuts, or vegetables (33 of 333 collections) relative to leaf litter substrates (76 of 1480  144 collections) (Fisher's Exact Test, p < 0.02) but not other substrate classes (Figure 2A) In addition to recording substrate classes, we measured elevation, ambient temperature and humidity, and 176 substrate temperature and moisture to determine if these niche parameters were important for individual 177 Caenorhabditis species (Figure 3;   Consistent with previous C. elegans collections in tropical regions (Andersen et al., 2012;Dolgin et al., 2008), 195 all C. elegans isolates were collected from elevations greater than 500 meters and were generally found at 196 higher elevations than other Caenorhabditis species ( Figure 3E; mean = 867 m; elevation: Dunn test, p < 197 0.00001). We also found that C. elegans-positive collections tended to be at cooler ambient and substrate 198 temperatures than other Caenorhabditis species (ambient temperature: Dunn test, p < 0.005; substrate 199 temperature: Dunn test, p < 0.00001), although these two environmental parameters were correlated with 200 elevation ( Figure 3F). Notably, the average substrate temperatures for C. elegans (19.4 ºC), C. tropicalis 201 (26.0 ºC), and C. briggsae (23.7ºC) positive collections are close to the optimal growth temperatures for 202 these species in the laboratory setting ( Figure 3B) (Poullet et al., 2015). Our collections also indicate that 203 C. oiwi tends to be found on drier substrates than C. elegans ( Figure 3D; Dunn test, p = 0.0021), but we 204 observed no differences among species for ambient humidity ( Figure 3C). Given the similar substrate and 205 environmental parameter preferences of C. tropicalis, C. briggsae, and C. oiwi, we next asked if these 206 species colocalized at either the local (< 30 m 2 ) or substrate (< 10 cm 2 ) scales. To sample at the local scale, 207 we collected samples from 20 gridsects (see Methods; Supplemental Figure 1) and observed no 208 colocalization of these three species, although only 16% of the total collections were a part of a gridsect. At 209 the substrate scale, we found C. tropicalis and C. briggsae cohabitating on two of 108 substrates with either 210 species present and C. oiwi and C. briggsae cohabitating on one of 107 substrates with either species 211 present (Supplemental Figure 2). Among 95 substrates with C. briggsae, we observed nine instances of C. 212 briggsae cohabitating with other PCR-positive species. We did not collect any samples that harbored 213 C. elegans and any other Caenorhabditis species. Taken together, these cohabitation results highlight the 214 ubiquitous nature of C. briggsae on the Hawaiian Islands and further suggests that the niche of C. elegans 215 might be distinct from C. tropicalis, C. briggsae, and C. oiwi on the Hawaiian Islands. 216 217 Hawaiian C. elegans are divergent from the global population 218 We previously showed that two C. elegans isolates from Hawaii are highly divergent relative to wild isolates 219 from other regions of the world and represent a large portion of the genetic diversity found within the species 220 ( SNVs identified in all of the 233 non-Hawaiian isotypes included in this study. We found that distinct isotypes 228 are frequently isolated within close proximity to one another in Hawaii. We identified up to seven unique 229 isotypes colocalized within a single grisect (less than 30 m 2 ) (Supplemental Figure 3A). We also found that 230 colocalization occurred at the substrate level; among the 38 substrates from which we isolated C. elegans, 231 12 contained two or more isotypes (Supplemental Figure 3B) a high degree of genome-wide relatedness among a majority of non-Hawaiian isotypes (Supplemental 237 Figure 4). By contrast, the Hawaiian isotypes are all diverged from the non-Hawaiian population with the 238 exception of five non-Hawaiian isotypes. Among these exceptions, ECA36 and QX1211 were collected from 239 urban gardens in New Zealand and San Francisco, CA respectively, and grouped with some of the most 240 divergent isotypes from Hawaii. More surprisingly, three non-Pacific Rim isotypes also grouped with the 241 Hawaiian isotypes. These include JU2879, MY16, and MY23. JU2879 was isolated from a rotting apple in 242 Mexico City, Mexico and both MY isotypes were isolated from garden composts in Nordrhein-Westfalen, 243 Germany, separated by approximately 5 km. Within the Hawaiian population, genome-wide relatedness 244 revealed a high degree of divergence (Supplemental Figure 4). This trend is further supported by elevated 245 levels of genome-wide average nucleotide diversity ( ) in the Hawaiian population relative to the non-246 Hawaiian population, which we found to be three-fold higher (Hawaii = 0.00124; non-Hawaiian = 247 0.000408, Figure

256
The genomic distribution of diversity followed a similar pattern across chromosomes for both populations, 257 wherein chromosome centers and tips exhibited lower diversity on average than chromosome arms ( Figure  258 4A; Supplemental Data 2). This pattern is likely explained by lower recombination rates, higher gene 259 densities, and elevated levels of background selection on chromosome centers (Consortium, 1998;Cutter 260 and Payseur, 2003; Rockman et al., 2010). Interestingly, we observed discrete peaks of diversity in specific 261 genomic regions (e.g., chr IV center), which suggests that balancing selection might maintain diversity at 262 these loci in both populations ( Figure 4A; Supplemental Data 2). This hypothesis is supported by 263 corresponding spikes in Tajima's D ( Figure 4B; Supplemental Data 3) (Tajima, 1989). Alternatively, higher 264 values of Tajima's D might indicate a population contraction, but the discrete nature of these peaks makes 265 this possibility less likely. A third possible explanation is that uncharacterized structural variation (e.g., 266 duplication and divergence) exists in these regions. Nevertheless, the variant sites within these discrete 267 peaks in and Tajima's D are unlikely the result of sequencing errors because they are identified across 268 multiple samples (see Methods). 269 270 Our previous analysis showed that 70-90% of isotypes contain reduced levels of diversity across several 271 megabases (Mb) on chromosomes I, IV, V, and X (Andersen et al., 2012). This reduced diversity was 272 hypothesized to be caused by selective sweeps that occurred within the last few hundred years, potentially 273 through drastic alterations of global environments by humans. The two Hawaiian isotypes, CB4856 and 274 DL238, did not share this pattern of reduced diversity, suggesting that they avoided the selective pressure. 275 Consistent with this previous analysis, we did not observe signatures of selection in the Hawaiian population 276 on chromosomes I, IV, V, and X, as measured by Tajima has largely been isolated from the selective pressures thought to be associated with human activity in many 282 regions of the world. 283 284 C. elegans population structure on Hawaii 285 To assess population structure among all 276 isotypes, we performed admixture analysis (see Methods). 286 This analysis suggested that the C. elegans species is composed of at least 11 ancestral populations (K), as 287 indicated by the minimization of cross-validation (CV) error between Ks 11-15 (Supplemental Figure 5). 288 The population assignments for K=11 closely aligned to the relatedness clusters we observed in a neighbor-289 joining network of all Hawaiian strains and the species-wide tree ( Figure 5, Supplemental Figure 4). For 290 Ks 11-15, the majority of Hawaiian isotypes consistently exhibit no admixture with non-Hawaiian ancestral 291 populations. However, a minority of Hawaiian isotypes are consistently either admixed with non-Hawaiian 292 populations (e.g. K=11, 14, and 15) or assigned to ancestral populations that contain non-Hawaiian isotypes 293 (e.g. K=12 and 13) (Supplemental Figure 5). These data support that a subset of Hawaiian isotypes are 294 consistently shown to exhibit a greater degree of genetic relatedness with non-Hawaiian isotypes across 295 different population subdivisions. Together, we found at least four distinct subpopulations on the Hawaiian 296 Islands and at least seven additional non-Hawaiian subpopulations comprise the remainder of 297 subpopulations from around the globe (Supplemental Figure 6). 298

307
The majority of isotypes assigned to the seven non-Hawaiian ancestral populations exhibit a high degree of 308 admixture with one another (at K=11), indicating that these populations are not well differentiated. By 309 contrast, isotypes assigned to three of the four Hawaiian ancestral populations showed almost no admixture. 310 We refer to the four Hawaiian populations as Volcano, Hawaiian Divergent, Hawaiian Invaded, and Hawaiian 311 Low for the following reasons. All eight isotypes in the Volcano population were isolated on the Big Island of 312 Hawaii at high elevation in wet rainforests primarily composed of ferns, ʻŌhiʻa lehua, and koa trees. We chose 313 to name this population 'Volcano' because the majority of isotypes were isolated from the town of Volcano. 314 The Hawaiian Divergent population is named for the two highly divergent isotypes, XZ1516 and ECA701, 315 which were isolated from Kauai, the oldest Hawaiian island sampled. However, we emphasize that the 316 population assignment of these two highly divergent isotypes might not be correct given that they each 317 contain many unique variants that were filtered from the admixture analysis. The Hawaiian Invaded 318 population is named because many of the isotypes assigned to this population exhibited admixture with non-319 Hawaiian ancestral populations, which is suggestive of an invasion of non-Hawaiian alleles into Hawaii 320 (Figure 6A, Supplemental Figure 7). 321 322

335
The Hawaiian Low population is named because isotypes assigned to this population tended to be isolated 336 at lower elevations than those assigned to the other Hawaiian populations (See Methods, Figure 6B). The 337 population structure of the Hawaiian isotypes suggests that geographic associations within the Hawaiian C. 338 elegans population exist either by elevation or by island. 339 340 Within the Hawaiian Invaded population, one of the 19 isotypes was isolated from outside of Hawaii (MY23), 341 and 11 of 18 Hawaiian isotypes showed admixture with various non-Hawaiian populations, particularly the 342 non-Hawaiian population C (Figure 6, Supplemental Figure 6). By contrast, just one individual assigned to 343 the global C population was admixed with the Hawaiian Invaded population (Supplemental Figure 6). This 344 result suggested that these populations either share ancestry or recent gene flow occurred between them. 345 To distinguish between these possibilities, we explicitly tested for the presence of gene flow among all 346 subpopulations using TreeMix (Pickrell and Pritchard, 2012), which estimates the historical relationships 347 among populations accounting for both population splits and migration events. We found evidence of gene 348 flow between the Hawaiian Invaded population and the non-Hawaiian population C (Figure 7C; haplotypes that were largely absent from the non-Hawaiian isotypes. By contrast, the Hawaiian Invaded 359 population shared haplotypes that were commonly found in non-Hawaiian isotypes assigned to non-Hawaiian 360 populations. For example, the isotypes in the Hawaiian Invaded population exhibiting admixture with the 361 global C population share haplotype arrangements on the left and center of Chr III (red and orange Chr III, 362 Figure 7A). We also found evidence of the globally swept haplotype in all of the isotypes from the Hawaiian 363 Invaded population, particularly on chromosomes I, V, and X, but less so on chromosome IV (Figure 7B, 364 Supplemental Figure 9). By contrast, greater than 50% of chromosome IV contained the swept haplotype 365 in all of the isotypes from the global C population (Supplemental Figure 9). Taken together, our data showed 366 that the Hawaiian isotypes from the Volcano, Hawaiian Divergent, and Hawaiian Low populations have 367 avoided the selective sweeps that are pervasive across most regions of the globe, and individuals within the 368 Hawaiian Invaded subpopulation have likely been outcrossed with these swept haplotypes. 369

382
We sought to deeply sample the natural genetic variation within the C. elegans species to better understand 383 the evolutionary history and driving forces of genome evolution in this powerful model system. Because the 384 Hawaiian Islands have been shown to harbor highly divergent strains relative to most regions of the world, 385 we choose to sample extensively on these islands. We developed a streamlined collection procedure that 386 facilitated our collection of over 2,000 samples across five Hawaiian islands. From these collections, we 387 isolated over 2,500 nematodes and used molecular data to partition 427 of these isolates into 13 distinct 388 taxa, mostly from the Rhabditidae family. In total, we identified and cryogenically preserved 95 new 389 C. elegans isolates that represent 26 genetically distinct isotypes. These isotypes represent the largest single 390 C. elegans collection effort on any island system and contain 27% more SNVs than all 233 non-Hawaiian 391 isotypes combined. Our findings confirm high diversity in Hawaii  Hawaii's position as a global trade-hub makes gene flow with the rest of the world more likely (Frankham, 429 1997 The ancestral niche of C. elegans might be similar to the Hawaiian niche 448 We used a publicly available weather data from the National Oceanic and Atmospheric Administration and 449 the National Climatic Data Center to measure the variation in seasonal temperatures for locations close to 450 the sites were isotypes were collected (Evans et al., 2017). We found that the Hawaiian populations 451 experienced less seasonal variability in temperature than any of the non-Hawaiian populations 452 (Supplemental Figure 10). These findings raise the possibility that the ancestral niche of C. elegans might 453 be similar to the thermally stable Hawaiian habitats where genetic diversity is highest. However, factors other 454 than seasonal temperature variation might also characterize the ancestral niche of C. elegans. The Hawaiian 455 Divergent population was enriched at higher elevation, which has been less impacted by human activities in 456 Hawaii loci that confer fitness advantages in human-associated habitats (Andersen et al., 2012). Taken together, we 462 suspect that the ancestral niche of C. elegans is likely to be similar to the thermally stable, high elevation 463 Hawaiian habitats where human impacts are less prevalent. 464 465 Unravelling the evolutionary history of C. elegans 466 More accurate models of C. elegans niche preferences will facilitate our ability to unravel the evolutionary 467 history of this species by directing researchers to areas most likely to harbor C. elegans populations. In order 468 to build more accurate niche models, future sampling efforts should include unbiased sampling across 469 environmental gradients in multiple locations over time because data on niche parameters where C. elegans 470 is not found is as important as data where C. elegans is found. Additionally, we must identify and quantify 471 important biotic niche factors, including associated bacteria, fungi, and invertebrates. These types of data 472 will help facilitate the identification of genes and molecular processes that are under selection in different 473 subpopulations across the species range. C. elegans offers a tractable and powerful animal model system 474 to connect environmental parameters to functional genomic variation. These data will deepen our 475 understanding of the evolutionary history of C. elegans by revealing how selection and demographic forces 476 have shaped the genome of this important model system. 477 Methods 479 480 Strains 481 Nematodes were reared at 20°C using OP50 bacteria grown on modified nematode growth medium (NGMA), 482 containing 1% agar and 0.7% agarose to prevent animals from burrowing (Andersen et al., 2014). In total, 483 169 C. briggsae, 100 C. elegans, 21 C. tropicalis, 15 C. oiwi, and four C. kamaaina wild isolates were 484 collected. Of these strains, 95 C. elegans, 19 C. tropicalis, and 12 C. oiwi wild isolates were cryopreserved 485 and are available upon request along with the other C. elegans strains included in our analysis 486 (Supplemental File 2). The type specimen for C. oiwi (ECA1100) is also deposited at the Caenorhabditis 487 Genetics Center (Supplemental File 1).

489
Sampling strategy 490 We sampled nematodes at 2,263 sites across five Hawaiian Islands during August 2017. Before travelling to 491 Hawaii, general sampling locations were selected based on accessibility via hiking trails and by proximity to 492 where C. elegans had been collected previously ( area that we refer to as a 'gridsect'. The gridsect comprised a center sampling point with additional sampling 499 sites at one, two, and three meters away from the center in six directions with each direction 60º apart from 500 each other (Supplemental Figure 1).

502
Field sampling and environmental data collection 503 To characterize the Caenorhabditis abiotic niche, we collected and organized data for several environmental 504 parameters at each sampling site using a customizable geographic data-collection application called 505 Fulcrum®. We named our customized Fulcrum® application 'Nematode field sampling' and used the 506 following workflow to enter the environmental data into the application while in the field. First, we used a 507 mobile device camera to scan a unique collection barcode from a pre-labelled plastic collection bag. This 508 barcode is referred to as a collection label or 'C-label' in the application and is used to associate a particular 509 sample with its environmental and nematode isolation data. Next, we entered the substrate type, landscape, 510 and sky view data into the application using drop down menus and photographed the sample in place using 511 a mobile device camera. The GPS coordinates for the sample are automatically recorded in the photo 512 metadata. We then measured the surface temperature of the sample using an infrared thermometer 513 Lasergrip 1080 (Etekcity, Anaheim, CA), its moisture content using a handheld pin-type wood moisture meter 514 MD912 (Dr. Meter, Los Angeles, CA), and the ambient temperature and humidity near the sample using a 515 combined thermometer and hygrometer device GM1362 (GoerTek, Weifang, China). These measurements 516 were entered into the appropriate fields in the application (Supplemental Table 3). Finally, we transferred 517 the sample into a collection bag and stored it in a cool location before we attempted to isolate nematodes. 518 Seventy samples in our raw data had missing GPS coordinates or GPS coordinates that were distant from 519 actual sampling locations after visual inspection using satellite imagery. The positions for these samples 520 were corrected using the average position of the two samples collected before and after the errant data point 521 or by manually assigning estimated positions. 522 523 Nematode isolation 524 Following each collection, the substrate sample was transferred from the barcoded collection bag to an 525 identically barcoded 10 cm NGMA plate seeded with OP50 bacteria. to pre-labeled 3.5 cm NGMA isolation plates seeded with OP50 bacteria. We refer to these isolation plates 532 as 'S-plates' in the Fulcrum® application we called 'Nematode isolation' (Supplemental Table 4). At the time 533 of isolation, we recorded the approximate number of nematodes on the collection plate and whether males 534 or dauers were present. Importantly, male and dauer observations from samples shipped from Hawaii were 535 not recorded to avoid bias caused by the long handling time of these samples. We merged the collection, 536 isolation, and environmental data together into a single data file with the 'process_fulcrum_data.R' script that 537 can be found in the scripts folder of the The presence of ITS2 PCR products was visualized on a 2% agarose gel in 1X TAE buffer. Isolates that did 552 not yield an ITS2 PCR product were labelled as 'PCR-negative', and those reactions that yielded the 553 expected 2 kb ITS2 PCR product were labelled as 'PCR-positive'. We then used Sanger sequencing to 554 sequence the ITS2 PCR products with forward primer oECA305. We classified Caenorhabditis species by 555 comparing the ITS2 sequences to the National Center for Biotechnology Information (NCBI) database using 556 the BLAST algorithm. Isolates with sequences that aligned best to genera other than Caenorhabditis were 557 only classified to the genus level. For every isolate where the BLAST results either aligned to C. elegans, 558 had an unexpectedly high number of mismatches in the center of the read, or did not match any known 559 sequences because of poor sequence quality, we performed another independent lysis and PCR using high-560 quality Taq polymerase (cat# RR001C, TaKaRa) to confirm our original results. For this confirmation, we 561 used the forward primer oECA305 and the reverse primer oECA306 (CACTTTCAAGCAACCCGAC) to 562 sequence the confirmation ITS2 amplicon in both directions. The sequence chromatograms were then quality 563 trimmed by eye with Unipro UGENE software (version 1.27.0) and compared to known nematode species in 564 the NCBI sequence database using the BLAST algorithm. We used the consensus alignment of the forward 565 and reverse reads to confirm our original results. For C. elegans, five of the 100 strains perished before we 566 could confirm their identity. We also confirmed that several strains that best aligned to C. kamaaina shared 567 a large number of mismatches in the center of the ITS2 amplicon, suggesting they belonged to a new species. 568 For these strains, we performed reciprocal mating tests with C. kamaaina to infer the new species by the 569 biological species concept . None of these crosses produced viable progeny, suggesting 570 that these isolates represent a new Caenorhabditis species (Supplemental File 1).

572
Illumina library construction and whole-genome sequencing 573 To extract DNA, we transferred nematodes from two 10 cm NGMA plates spotted with OP50 E. coli into a 15 574 ml conical tube by washing with 10 mL of M9. We then used gravity to settle animals on the bottom of the 575 conical tube, removed the supernatant, and added 10 mL of fresh M9. in this study is briefly described below (Supplemental Table 5). All pipelines follow the "pipeline name-nf" 590 naming convention and full descriptions can be found on the Andersen lab dry-guide website: 591 (http://andersenlab.org/dry-guide/pipeline-overview/). 592 Raw sequencing reads were trimmed using trimmomatic-nf, which uses trimmomatic (v0.36) (Bolger 593 et al., 2014) to remove low-quality bases and adapter sequences. Following trimming, we used the 594 concordance-nf pipeline to characterize C. elegans strains isolated in this study and previously described 595 strains ( binding sites, histone binding sites, miRNA binding sites, splice sites, ancestral alleles (XZ1516 set as 621 ancestor), the genetic map position, and repetitive elements using vcfanno (v 0.2.8) (Pedersen et al., 2016). 622 All annotations were obtained from WS266. We removed regions that were annotated as repetitive. We 623 named this VCF the 'PopGen VCF' (Supplemental Data 4; Supplemental Table 2).

625
Phylogenetic analyses 626 We characterized the relatedness of the C. elegans population using RAxML-ng with the GTR DNA 627 substitution model and maximum likelihood estimation to find the parameter values that maximize the 628 phylogenetic likelihood function, and thus provide the best explanation for the observed data (Kozlov et al.,629 2019). We used the vcf2phylip.py script (Ortiz, n.d.) to convert the 'PopGen VCF' (Supplemental Data 4) to 630 the PHYLIP format (Felsenstein, 1993) required to run RAxML-ng. To construct the tree that included 276 631 strains, we used the GTR evolutionary model available in RAxML-ng (Lanave et al., 1984;Tavaré, 1986). 632 Trees were visualized using the ggtree (v1.10.5) R package (Yu et al., 2017). To construct the neighbor-net 633 phylogeny, we used SplitsTree4 (Huson and Bryant, 2006 present in one isotype. We ran ADMIXTURE ten independent times for K sizes ranging from 2 to 20 for all 645 276 isotypes. Visualization of admixture results was performed using the pophelper (v2.2.5) R package 646 (Francis, 2017). We chose K=11 for future analyses because the cross-validation (CV) error approached 647 minimization at this K (Supplemental Figure 5). Furthermore, K=11 subset the Hawaiian isotypes into four 648 distinct populations, which exactly matched the subsets obtained from running ADMIXTURE on just the 43 649 Hawaiian isotypes at K=4 (K=4 minimized CV for ADMIXTURE with Hawaiian isotypes only, (Supplemental 650 Figure 11). We performed TreeMix analysis on K=11 for zero to five migration events (Pickrell and Pritchard, 651 2012). 652 653 Haplotype analysis 654 We determined identity-by-descent (IBD) of strains using IBDSeq (Browning and Browning, 2013) run on the 655 'PopGen VCF' (Supplemental Data 4) with the following parameters: minalleles=0.01, ibdtrim=0, r2max=0.8.

656
IBD segments were then used to infer haplotype structure among isotypes as described previously (Andersen 657 et al., 2012). After haplotypes were identified, we defined the most common haplotype found on 658 chromosomes I, IV, V, and X as the swept haplotype. We then retained the swept haplotypes within isotypes 659 that passed the following per chromosome filters: total length > 1 Mb; total length / maximum population-660 wide swept haplotype length > 0.03. We classified chromosomes within isotypes as swept if the sum of the 661 retained swept haplotypes for a chromosome was > 3% of the maximum population wide swept haplotype 662 length for that chromosome. 663 664 Environmental parameter analysis 665 We calculated the pairwise distances among all C. elegans-positive collections on Hawaii and detected five 666 distinct geographic clusters, each of which contain collections that are within 20 meters of one another. The 667 largest of these clusters comprised 18 collections in the Kalopa State Recreation Area on the Big Island of 668