Genetic differences between willow warbler migratory phenotypes are few and cluster in large haplotype blocks

It is well established that differences in migratory behavior between populations of songbirds have a genetic basis but the actual genes underlying these traits remains largely unknown. In an attempt to identify such candidate genes we de novo assembled the genome of the willow warbler Phylloscopus trochilus, and used whole‐genome resequencing and a SNP array to associate genomic variation with migratory phenotypes across two migratory divides around the Baltic Sea that separate SW migrating P. t. trochilus wintering in western Africa and SSE migrating P. t. acredula wintering in eastern and southern Africa. We found that the genomes of the two migratory phenotypes lack clear differences except for three highly differentiated regions located on chromosomes 1, 3, and 5 (containing 146, 135, and 53 genes, respectively). Within each migratory phenotype we found virtually no differences in allele frequencies for thousands of SNPs, even when comparing geographically distant populations breeding in Scandinavia and Far East Russia (>6000 km). In each of the three differentiated regions, multidimensional scaling‐based clustering of SNP genotypes from more than 1100 individuals demonstrates the presence of distinct haplotype clusters that are associated with each migratory phenotype. In turn, this suggests that recombination is absent or rare between haplotypes, which could be explained by inversion polymorphisms. Whereas SNP alleles on chromosome 3 correlate with breeding altitude and latitude, the allele distribution within the regions on chromosomes 1 and 5 perfectly matches the geographical distribution of the migratory phenotypes. The most differentiated 10 kb windows and missense mutations within these differentiated regions are associated with genes involved in fatty acid synthesis, possibly representing physiological adaptations to the different migratory strategies. The ∼200 genes in these regions, of which several lack described function, will direct future experimental and comparative studies in the search for genes that underlie important migratory traits.

. Size is the first principal component from a PCA on wing 34 length, tarsus length and bill-head length. Color represents a scale between 0-9 that quantifies the 35 amount of whiteness based on a comparison to three reference specimens.      containing sequences that have not been linked to any particular chromosome or linkage group.

131
The willow warbler transcriptome sequence data consists of 1.8 million sequence reads    Geographical distribution of divergent region haplotypes 512 We used a multidimensional scaling (MDS)-based method in the R package invclust (Caceres & 513 Gonzalez 2015) to determine if there was limited recombination between southern and northern 514 haplotypes (i.e. haplotypes most common in either of the migratory phenotype) in each 515 differentiated region. This method has been used to identify inversion polymorphisms, i.e.   Table S3) as females differ from males in size measurements.

553
The strength of the relationship between allele frequency and phenotypes was quantified using a variants from discordantly aligned read pairs and split reads. Inversion calls were further quality 564 filtered using the delly script populationFilter.py.

565
As breakpoints could be located in between the ends of the regions and their neighboring 566 scaffolds, we also designed primers that could amplify sequences across these gaps. In order to 567 design suitable primers we extracted a few kb of scaffold sequence at each end of the divergent  (Table S9) were designed in particularly 576 conserved parts of the alignments and blasted against the willow warbler genome to minimize 577 the risk of designing primers in repetitive regions. Finally, we checked that the primer sites did 578 not have any polymorphisms in the resequenced samples and that they had appropriate GC 579 content and melting temperatures. For amplification we used a long-range PCR kit (Qiagen, CA, USA) and followed the instructions of the manufacturer. PCR products were visualized on a 0.5 581 % agarose gel with 1kb ladder and lambda DNA for estimation of fragment sizes. For each locus,

582
PCR products with the strongest band was extracted from the agarose gels using a Pasteur 583 pipette, mixed with 100 µl of water and heated to 95ºC to dissolve the agarose. This was used as 584 template for a second PCR and using a BigDye Terminator sequencing kit (Applied Biosystems, 585 TX, USA).  (Table S1). Twelve of the individuals in the resequencing 597 data had been included in the SNP array and could be genotyped using the MDS analysis (see 598 above). To obtain genotypes for all re-sequenced individuals we extracted genotypes of a subset 599 of the MDS SNPs (N=174) that were also present in the filtered vcf file. We then added the 600 genotypes from the array samples for the same subset of SNPs and performed the same MDS 601 clustering as had been previously used. Before further analyses we checked that the clustering of 602 this subset of data resulted in three distinct and equidistant groups, and that the twelve samples that were included on both of the array and in the resequencing data had the same genotype.
Once the re-sequence-based genotype data had been divided into groups of pure northern and 605 southern individuals, we used vcftools to estimate nucleotide diversity and Tajima's D in each 606 group for all bi-allelic SNPs within 10 kb windows. We used a customized perl script to 607 calculate the average number of pairwise differences between sequences from each subspecies,