Protein-coding variation and introgression of regulatory alleles drive plumage pattern diversity in the rock pigeon

Birds and other vertebrates display stunning variation in pigmentation patterning, yet the genes controlling this diversity remain largely unknown. Rock pigeons (Columba livia) are fundamentally one of four color pattern phenotypes, in decreasing order of melanism: T-check, checker, bar (ancestral), or barless. Using whole-genome scans, we identified NDP as a candidate gene for this variation. Allele-specific expression differences in NDP indicate cis-regulatory differences between ancestral and melanistic alleles. Sequence comparisons suggest that derived alleles originated in the speckled pigeon (Columba guinea), providing a striking example of introgression of alleles that are favored by breeders and are potentially advantageous in the wild. In contrast, barless rock pigeons have an increased incidence of vision defects and, like two human families with hereditary blindness, carry start-codon mutations in NDP. In summary, we find unexpected links between color pattern, introgression, and vision defects associated with regulatory and coding variation at a single locus.

guinea), providing a striking example of introgression of alleles that are favored by 23 breeders and are potentially advantageous in the wild. In contrast, barless rock pigeons 24 have an increased incidence of vision defects and, like two human families with 25 hereditary blindness, carry start-codon mutations in NDP. In summary, we find 26 unexpected links between color pattern, introgression, and vision defects associated with 27 regulatory and coding variation at a single locus. 28

INTRODUCTION 29
Vertebrates have evolved a vast array of epidermal colors and color patterns in 30 response to natural, sexual, and artificial selection. Numerous studies have identified key 31 genes that determine variation in the types of pigments that are produced (e.g., Hubbard 32 shield pigmentation phenotypes are determined by an allelic series at a single locus (C, 51 In this study, we investigate the molecular and evolutionary mechanisms 74 underlying wing pattern diversity in pigeons. We discover both coding and regulatory 75 variation at a single candidate gene, and a trans-species polymorphism linked with 76 pattern variation within and between species that likely resulted from interspecies 77 hybridization. and a homozygous bar bird and is hereafter referred to as the minimal checker haplotype. 94 As expected for the well-characterized allelic series at the C locus, we also found that a 95 broadly overlapping region of Scaffold 68 was highly differentiated between the genomes 96 of bar and barless birds (p = 3.11e-15, genome-wide significance threshold = 9.71e-10; 97 To identify genetic variants associated with the derived checker and T-check 102 phenotypes, we first compared annotated protein-coding genes throughout the genome. 103 We found a single, predicted, fixed change in EFHC2 (Y572C, Fig. S2) in checker and T-104 check birds relative to bar birds (VAAST; Yandell et al. 2011). However, this same 105 amino acid substitution is also found in Columba rupestris, a closely related species to C. 106 livia that has a bar wing pattern. Thus, the Y572C substitution is not likely to be 107 causative for the checker or T-check pattern, nor is it likely to have a strong impact on 108  Next, we examined sequence coverage across the checker haplotype and 111 discovered a copy number variable (CNV) region (approximate breakpoints at Scaffold 112 68 positions 1,790,000 and 1,805,600). Based on normalized read-depths of resequenced 113 birds, we determined that the CNV region has one, two, or four copies per chromosome. 114 Bar birds (n=12) in our resequencing panel always had a total of two copies in the CNV 115 region (one on each chromosome), but most checker (n=5 of 7) and T-check (n=2 of 2) 116 genomes examined had additional copies of the CNV (Fig. 2A). Using a PCR assay to 117 amplify across the breakpoints in birds with more than one copy per chromosome, we 118 determined that additional copies result from tandem repeats. We found no evidence that 119 the checker haplotype contains an inversion based on mapping of paired-end reads at the 120 CNV breakpoints (WHAM; Kronenberg et al. 2015). In addition, we were able to amplify 121 unique PCR products that span the outer CNV breakpoints (data not shown), suggesting 122 that there are no inversions within the CNV region. 123 The fact that some checker birds had only two total copies of the CNV region 124 demonstrates that a copy number increase is not necessary to produce melanistic 125 phenotypes. However, consistent with the dominant inheritance pattern of the phenotype, 126 all checker and T-check birds had at least one copy of the checker haplotype. Thus, a 127 checker haplotype on at least one chromosome appears to be necessary for the dominant 128 melanistic phenotypes, but additional copies of the CNV region are not. 129 In a larger sample of pigeons, we found a significant association between copy 130 number and phenotype (TaqMan assay; pairwise Wilcoxon test, p=2.1e-05). Checker 131 (n=40 of 56) and T-check (n=15 of 18) patterns were associated with additional copies, 132 and pigeons with the bar pattern (n=20) never had more than two copies in total (Fig. 133 2B). Although additional copies of the CNV only occurred in checker and T-check birds, 134 we did not observe a consistent number of copies associated with either phenotype. This 135 could be due to a variety of factors, including modifiers that darken genotypically-136 checker birds to closely resemble T-check (Jones 1922;Sell 2012) and environmental 137 factors such as temperature-dependent darkening of the wing shield during feather 138 development (Podhradsky 1968). 139 Due to the potential ambiguity in categorical phenotyping, we next measured the 140 percent of pigmented area on the wing shield and tested for associations between copy 141 number and percentage of pigmented wing shield area. We phenotyped and genotyped an 142 additional 63 birds from diverse domestic breeds as well as 26 feral birds, and found that 143 estimated copy number in the variable region was correlated with the amount of dark 144 pigment on the wing shield (nonlinear least squares regression, followed by r 2 145 calculation; r 2 =0.46) (Fig. 2C). This correlation was a better fit to the regression when 146 ferals were excluded (r 2 =0.68, Fig. S3B), possibly because numerous pigmentation 147 modifiers (e.g., sooty and dirty) are segregating in feral populations (Hollander 1938;148 Johnston and Janiga 1995). Together, our analyses of genetic variation among 149 phenotypes point to a CNV that is associated with qualitative and quantitative color 150 pattern variation in pigeons. 151

NDP is differentially expressed in feather buds of different wing pattern phenotypes 153
The CNV that is associated with wing pattern variation resides between two 154 genes, EFHC2 and NDP. EFHC2 is a component of motile cilia, and mouse mutants have 155 juvenile myoclonic epilepsy (Linck et al. 2014). In humans, allelic variation in EFHC2 is 156 also associated with differential fear responses and social cognition (Weiss et  and T-check patterned feathers (p=0.046, Fig. 3A). Expression levels of other genes 176 adjacent to the minimal checker haplotype region also did not vary by phenotype (Fig.  177   S4). 178 In contrast, expression of NDP was significantly increased in checker feathers -179 and even higher in T-check feathers -relative to bar feathers ( Wilcoxon test, all comparisons were significant at a false discovery rate of 0.05). 182 Moreover, when qRT-PCR expression data for checker and T-check feathers were 183 grouped by copy number instead of categorical phenotype, the number of CNV copies 184 was positively associated with NDP expression level (Fig. S5). Thus, expression of NDP 185 is positively associated with both increased melanism (categorical pigment pattern 186 phenotype) and CNV genotype. (checker alleles with one, two, or four copies of the CNV). In the common trans-acting 198 cellular environment of heterozygous birds, checker alleles of NDP were more highly 199 expressed than bar alleles, and these differences were further amplified in checker alleles 200 with more copies of the CNV (  deficiencies and, as in humans with certain mutant alleles of NDP, this phenotype is not 219 completely penetrant (Hollander 1983). Thus, based on the known allelism at the C locus, 220 the nomination of regulatory changes at NDP as candidates for the C and C T alleles, and 221 the vision-related symptoms of Norrie disease, NDP is also a strong candidate for the 222 barless phenotype (c allele). 223 To test the prediction that variation in NDP is associated with the barless 224 phenotype, we used VAAST to scan the resequenced genomes of 9 barless pigeons and 225 found that all were homozygous for a nonsynonymous protein-coding change at the start 226 codon of NDP that was perfectly associated with the barless wing pattern phenotype. We 227 detected no other genes with fixed coding changes or regions of significant allele 228 frequency differentiation (pFst) elsewhere in the genome. We genotyped an additional 14 229 barless birds and found that all were homozygous for the same start-codon mutation (Fig. 230 S6). The barless mutation is predicted to truncate the amino terminus of the NDP protein 231 by 11 amino acids, thereby disrupting the 24-amino acid signal peptide sequence 232 (www.uniprot.org, Q00604 NDP_Human). NDP is still transcribed and detectable by RT-233 PCR in regenerating barless feathers (data not shown); therefore, we speculate that the 234 start-codon mutation might alter the normal secretion of the protein into the extracellular 235 matrix (Gierasch 1989). Hochlan, G. Young, personal communication) (Hollander 1983). Although C. livia and 253 C. guinea diverged an estimated 4-5 million years ago, inter-species crosses can produce 254 fertile hybrids (Whitman 1919;Irwin et al. 1936;Taibel 1949;Miller 1953). Moreover, 255 hybrid F 1 and backcross progeny between C. guinea and bar C. livia have checkered 256 wings, much like C. livia with the C allele (Whitman 1919;Taibel 1949). Taibel (1949) 257 showed that, although hybrid F 1 females were infertile, two more generations of 258 backcrossing to C. livia could produce checker offspring of both sexes that were fully 259 fertile. In short, Taibel introgressed the checker trait from C. guinea into C. livia in just 260 three generations. 261 To evaluate the possibility of an ancient cross-species introgression event, we 262 sequenced an individual C. guinea genome to ~30X coverage and mapped the reads to 263 the C. livia reference assembly. We calculated four-taxon D-statistics ("ABBA-BABA" 264 test; Durand et al. 2011) to test for deviations from expected sequence similarity between 265 C. guinea and C. livia, using a wood pigeon (C. palumbus) genome as an outgroup. In 266 this case, the null expectation is that the C candidate region will be more similar between 267 conspecific bar and checker C. livia than either will be to the same region in C. guinea. 268 That is, the phylogeny of the candidate region should be congruent with the species 269 phylogeny. However, we found that the D-statistic approaches 1.0 in the candidate region 270 (n=10 each for bar and checker C. livia), indicating that checker C. livia are more similar 271 to C. guinea than to conspecific bar birds in this region (Fig. 4A). The mean genome-272 wide D-statistic was close to zero (0.021), indicating that bar and checker sequences are 273 more similar to each other throughout the genome than either one is to C. guinea. 274 This unexpected similarity between C. guinea and checker C. livia in the pattern 275 candidate region was further supported by sequence analysis using HybridCheck (Ward 276 and van Oosterhout 2016). Outside of the candidate region, checker birds have a high 277 sequence similarity to conspecific bar birds and low similarity to C. guinea (Fig. 4B). 278 Within the candidate region, however, this relationship shows a striking reversal, and 279 checker and C. guinea sequences are most similar to each other. In addition, although the 280 genome-wide D-statistic was relatively low, the 95% confidence interval (CI) was greater 281 than zero (0.021 to 0.022), providing further evidence for one or more introgression 282 events from C. guinea into checker and T-check genomes. Unlike in many checker and 283 T-check C. livia, we did not find additional copies of the CNV region in C. guinea. This 284 could indicate that the CNV expanded in C. livia, or that the CNV is present in a subset 285 of C. guinea but has not yet been sampled. Taken together, these patterns of sequence 286 similarity and divergence support the hypothesis that the candidate checker haplotype in 287 rock pigeons originated by introgression from C. guinea. 288 289

Estimated divergence time among pattern locus haplotypes 290
While post-divergence introgression is an attractive hypothesis to explain the 291 sequence similarity between checker C. livia and C. guinea, another formal possibility is 292 that sequence similarity between these groups is due to incomplete lineage sorting. 293 Therefore, to assess whether the minimal checker haplotype might have been present in 294 the last common ancestor of C. guinea and C. livia, we measured single nucleotide 295 differences among different alleles of the minimal haplotype and compared these counts 296 to polymorphism rates expected to accumulate over the divergence time between C. livia 297 and C. guinea (Fig. 4C, purple bar, see Methods). 298 We found that polymorphisms between bar C. livia and C. guinea approached the 299 number expected to accumulate in 4-5 MY (1708±109 mean SNPs, Fig. 4C), but so did 300 intraspecific comparisons between bar and checker C. livia (1564±99). In contrast, C. 301 guinea and C. livia checker sequences had only 384±6 mean differences, significantly 302 fewer than would be expected to accumulate between the two species (p < 2.2e-16, t-303 test). These results support an introgression event from C. guinea to C. livia, rather than a 304 shared ancestral allele inherited from a common ancestor prior to divergence. Among 11 305 checker haplotype sequences we found only 26±8 mean differences. 306 We then estimated the age of the minimal checker haplotype following Voight et Alternatively, or perhaps additionally, multiple (but relatively similar) checker 314 haplotypes could have been introgressed more recently by pigeon breeders. Once male 315 hybrids are generated, this can be accomplished in just a few generations (Taibel 1949). 316 This explanation is supported by lack of diversity among checker haplotypes, with only 317 26±8 mean differences, which is unusually low for the diversity typically observed in 318 large, free-living pigeon populations (Shapiro et al. 2013). Additional C. guinea genome 319 sequences will help to characterize allelic variation at this locus and resolve these 320 instructed to submit only samples that were unrelated by grandparent. DNA was then 393 extracted from blood, tissue, and feathers as previously described (Stringham et al. 2012). 394

Determination of color and pattern phenotype of adult birds 396
Feather and color phenotypes of birds were designated by their respective 397 breeders. Birds that were raised in our facility at the University of Utah or collected from 398 feral populations were assigned a phenotype using standard references (Levi 1986

Quantification of pigment pattern phenotype 453
At the time of blood sample collection, the right wing shield was photographed 454 (RAW format images from a Nikon D70 or Sony a6000 digital camera). In Photoshop 455 (Adobe Systems Incorporated, San Jose, CA), the wing shield including the bar (on the 456 secondary covert feathers) was isolated from the original RAW file. Images were 457 adjusted to remove shadows and the contrast was set to 100%. The isolated adjusted wing 458 shield image was then imported into ImageJ (imagej.nih.gov/) in JPEG format. Image 459 depth was set to 8-bit and we then applied the threshold command. Threshold was further 460 adjusted by hand to capture checkering and particles were analyzed using a minimum 461 pixel size of 50. This procedure calculated the area of dark plumage pigmentation on the 462 wing shield. Total shield area was calculated using the Huang threshold setting and 463 analyzing the particles as before (minimum pixel size of 50). The dark area particles were 464 divided by total wing area particles, and then multiplied by 100 to get the percent dark 465 area on the wing shield. Measurements were done in triplicate for each bird, and the 466 mean percentages of dark area for each bird were used to test for associations between 467 copy number and phenotype using a non-linear least squares regression. PCR experiments. Nine days after plucking, regenerating feather buds were removed, the 473 proximal 5 mm was cut longitudinally, and specimens were stored in RNAlater (Qiagen, 474 Valencia, CA) at 4°C for up to three days. Next, collar cells were removed, RNA was 475 isolated, and mRNA was reverse-transcribed to cDNA as described previously (Domyan 476 et al. 2014). Intron-spanning primers (see Table S1) were used to amplify each target 477 using a CFX96 qPCR instrument and iTaq Universal Syber Green Supermix (Bio-Rad, 478 Hercules, CA). Samples were run in duplicate and normalized to β-actin. The mean value 479 was determined and results are presented as mean ± S.E. for each phenotype. Results 480 were compared using a Wilcoxon Rank Sum test and expression differences were 481 considered statistically-significant if p < 0.05. 482 483

Allele-specific expression assay 484
SNPs in NDP and EFHC2 were identified as being diagnostic of the bar or 485 checker/T-check haplotypes from resequenced birds. Heterozygous birds were identified 486 by Sanger sequencing in the minimal checker haplotype region (AV17 primers, see Table  487 S1). Twelve checker and T-check heterozygous birds were then verified by additional 488 Sanger reactions (AV54 for NDP and AV97 for EFHC2, see Table S1) to be 489 heterozygous for the SNPs in NDP and EFHC2. PyroMark Custom assays (Qiagen) were 490 designed for each SNP using the manufacturer's software (Table S1) Table S1. Primers pairs were 499 designed using the rock pigeon reference genome (Cliv_1.0) (Shapiro et al. 2013). PCR 500 products were purified using a QIAquick PCR purification kit (Qiagen) and Sanger 501 sequenced. Sequences from each exon were then edited for quality with Sequencher v.5.1 502 (GeneCodes, Ann Arbor, MI). Sequences were translated and aligned with SIXFRAME 503 and CLUSTALW in SDSC Biology Workbench (http://workbench.sdsc.edu). Amino acid 504 sequences outside of Columbidae were downloaded from Ensembl (www.ensembl.org). 505 506