Genetics of adaptation in modern chicken

We carried out whole genome resequencing of 127 chicken including red jungle fowl and multiple populations of commercial broilers and layers to perform a systematic screening of adaptive changes in modern chicken (Gallus gallus domesticus). We uncovered >21 million high quality SNPs of which 34% are newly detected variants. This panel comprises >115,000 predicted amino-acid altering substitutions as well as 1,100 SNPs predicted to be stop-gain or -loss, several of which reach high frequencies. Signatures of selection were investigated both through analyses of fixation and differentiation to reveal selective sweeps that may have had prominent roles during domestication and breed development. Contrasting wild and domestic chicken we confirmed selection at the BCO2 and TSHR loci and identified 34 putative sweeps co-localized with ALX1, KITLG, EPGR, IGF1, DLK1, JPT2, CRAMP1, and GLI3, among others. Analysis of enrichment between groups of wild vs. commercials and broilers vs. layers revealed a further panel of candidate genes including CORIN, SKIV2L2 implicated in pigmentation and LEPR, MEGF10 and SPEF2, suggestive of production-oriented selection. SNPs with marked allele frequency differences between wild and domestic chicken showed a highly significant deficiency in the proportion of amino-acid altering mutations (P<2.5×10−6). The results contribute to the understanding of major genetic changes that took place during the evolution of modern chickens and in poultry breeding.


Introduction
The modern chicken (Gallus gallus domesticus) was domesticated from the red jungle fowl (RJF) [1], but with some contributions from at least one other closely related species, the grey jungle fowl [2]. Domestic chicken segregate into several hundreds of distinct breeds distributed across the world. During the last century, the domestic chicken has been developed into a major food source by adapting the genome to specialized egg laying (layers) and fast-growing meat birds (broilers) whose productivity far exceeds their wild ancestor as well as the domestic chicken present only 100 years ago. Most modern commercial layers produce~300 eggs in a year while the RJF usually lay a single clutch of 5-9 eggs per year. Modern broilers rapidly reach a body weight of 4-5 kg compared to the~1 kg weight of an adult RJF male [3]. The commercial broiler and layer suppliers produce more than 70 billion birds annually to meet current worldwide consumer demands of more than 120 million tons of meat and over 1.2 trillion eggs [4].
The increasing productivity has been achieved through intensive directional selection on production traits over several tens of generations in purebred populations of limited effective population size followed by crossbreeding strategies in the generation of production animals. Maximizing yield however, has been at the price of reduced immunity and accompanied by a number of undesirable traits [5]. These negative effects may either be the result of pleiotropy of genes under selection for increased productivity, hitch-hiking of unfavourable alleles with the alleles under selection, or genetic drift. Understanding the nature of adaptive forces acting on the genome of commercial chicken provides insight into the complex relationship between production, disease and genes while opening up new directions for further improvement of this important farm animal, that is essential for global food security.
The domestic chicken is an excellent model to investigate the genetics of adaptation, as it involves transformation of the ancestral red jungle fowl into a domesticated bird. Furthermore, parallel populations of broilers and layers were independently established from earlier multipurpose populations by several breeding companies selecting for very similar breeding goals during the last hundred years. This scenario allows us to explore if the same alleles are responsible for the selection response in these parallel populations. In this study, we conducted a systematic comparison of genomic sequence variation from multiple populations of broilers and layers, versus each other and versus RJF to identify genes that underwent selection during domestication and the subsequent specialization of domestic chicken into broiler and layer lines. We report the discovery and characterization of over 21 million SNPs, 34% of which were not previously described. Analysis of selection provides a comprehensive list of candidate loci underlying domestication and/or changes in production-relevant traits. We further report a highly significant (P<2.5×10 −6 ) deficiency of amino-acid altering mutations among those showing strong genetic differentiation between RJF and commercial birds.

Detecting millions of high-quality SNPs
The bioinformatics analysis using the described criteria detected *26.3 million putative SNPs and INDELs. After filtering, over 21 million high-quality bi-allelic SNPs were retained that were either segregating or fixed for a non-reference within a population. The retained variants were distributed in the genome with an average density of 1 SNP every~50 bases. About 34% of these SNPs (n = 7,146,382) had not been reported before. The number of SNPs detected in each population varied between 7.6 and 17.4 million ( Table 1). For the layer lines, the proportion of segregating variants was rather low, with an average of 57% of total variation, while the corresponding average for the broilers was 65%. RJFt alone carries 86% of all detected variants. These results show that layers have lost a considerable amount of the genetic diversity present in their wild ancestor as also indicated by the significantly lower levels of nucleotide diversity (π) in LRs (0.15-0.20%) compared with that observed in RJFt (0.40%; Table 1), although the possibility exists that the nucleotide diversity in RJFt is somewhat inflated if multiple subpopulations in northern Thailand was sampled. The low nucleotide diversity of RJFi (0.13%) is presumably due to the fact that this population has been maintained as a small, closed breeding population for many years. The observed reduction in nucleotide diversity in the layer lines is mainly attributed to small number of founders and many generations of mating within closed lines of limited population size, but also partly due to the effect of linked selection.
We detected 115,107 amino acid-altering SNPs of which 17% were predicted by SIFT to be evolutionary intolerant (SIFT scores = 0.00-0.05), 215,810 synonymous variants, 588,491 variants within untranslated regions and 1,100 stop-gain or -loss variants. An unknown fraction of these will have functional consequences.

Allele frequency spectrum
The comparison of the allele frequency profiles of wild and commercial populations reveals substantial differences (Fig 1A; S2 Fig). In wild birds (RJFt), the distribution of allele frequencies shows a marked overrepresentation of infrequent alleles which is consistent with the Purifying selection revealed in commercial chicken pattern observed for high-quality data in many other organisms including human and cattle populations [6,7]. In contrast, commercial populations, particularly layers (S2 Fig), show a substantially smaller proportion of rare alleles that can be attributed to the smaller effective population size caused by recent selective breeding leading to loss of rare alleles. A subtle excess in the proportion of missense relative to synonymous mutations is evident among rare alleles, presumably caused by selection reducing the allele frequency of slightly deleterious mutations [6,8]. Fig 1B shows the distribution of population-and group-specific variants detected from individual sequencing only. Out of the >18 million variants detected in RJFt, as many as 4.4 million were unique to this population ( Fig 1B). This suggests loss of genetic diversity during domestication and breeding, although this might be partly due to genetic differences between the RJF birds used in this study and the ancestral population(s) of red jungle fowl that contributed to chicken domestication. We compared the distributions of population-specific SNPs among commercial and wild chicken to investigate differences in the frequency patterns ( Fig  1C). With the exception of the inbred RJFi population, the layer lines exhibit higher frequencies of population-specific alleles. This may be a consequence of a more narrow genetic basis and successive generations of selective breeding in commercial populations to enhance the frequency of favourable alleles. A good proportion of these loci are probably dragged to higher frequencies due to linkage with the target loci under selection [9]. Summary statistics of group-specific variants discovered exclusively in the layer and broiler lines are presented in supplementary Tables 1-4.

Principal component analysis of genetic relationships
We performed a comprehensive analysis of genetic similarity based on genotypes from >21 million SNPs. As expected, individually sequenced birds from the same population clustered together ( Fig 1D; S3 Fig). The white (WL) and brown (BL) laying birds clustered distantly, although they are both layers, a result consistent with previous data [10]. Rhode White (RWp) is a layer breed developed by crossing white and brown layers and is located in the middle of the plot. The clusters of RJFs from Thailand and India were in fairly close proximity to one other. Broilers showed a strong clustering in the middle of the plot, probably due to the common ancestor of all, rooted back to the Cornish breed [11]. These results provide important background information for our attempts to identify loci under selection in the domestic populations.
Detecting selective sweeps a. Analysis of genetic differentiation. The level of genetic differentiation varies among chromosomes, annotation categories as well as groups of birds (S4 Fig). To detect putative selective sweeps, we first searched the genome for regions with high degrees of differentiation between groups (RJF, LRs and BRs). Across the genome we observed the largest F ST values in contrasts between populations with the lowest nucleotide diversities reflecting genetic drift (Tables 1 and 2). F ST values were estimated in sliding 40 kb windows along the genome in steps of 20 kb. The size of a selective sweep depends on multiple factors such as the local recombination rate, selection intensity, and the number of generations that passed from the time when a favourable mutation arose and it became fixed. Thus, selective sweeps vary in size due to several variables, making it difficult to determine an optimal window size in which to search for signatures of selection. Thus, we cannot rule out that our approach may have failed to detect sweeps that would have been detected using other fixed or variable window sizes.   Fig), which complicates defining a threshold to distinguish true selective sweeps from regions showing genetic differentiation due to genetic drift. We therefore defined putative sweeps as those reaching a ZF ST score � 6, as these were in the extreme upper end of the distribution (S4C Fig). We however believe that loci further down the list still merit further examination in follow-up studies. All windows with ZF ST � 4 in any of the comparisons are listed in S9 Table. Only~0.13% of the windows (n = 60) had a ZF ST score � 6 in the 'RJFs vs. Coms' comparison, and the corresponding fractions were~0.05% for 'BRs vs. LRs' (n = 41),~0.03% for RJFs vs. LRs (n = 66) and~0.07% for 'RJFs vs. BRs' (n = 90). In total, 31 putative sweeps were mapped with ZF ST -scores exceeding the threshold at least in one of the contrasts (Table 3). We used the yellow skin (BCO2) locus [2] and the TSHR locus [12] as proofs of principle showing that our approach can reveal established sweeps. We observed an F ST value of 0.65 (ZF ST = 7.0) over BCO2 (Fig 2A) and the localization perfectly overlapped the previously defined sweeps. The window harbouring the TSHR locus showed an F ST value of 0.34 (ZF ST = 3.4) in the 'RJFs vs. Coms' contrast residing within 9% of top differentiated windows (S6 Table). Another signal (ZF ST = 10.6) overlapping a previously detected sweep was mapped on chromosome 1 over IGF1, which encodes insulin-like growth factor 1, an important growth factor associated with body size in dogs [13]. This signal appeared in three out of four contrasts where RJFs were included and were maximum when wild birds were compared against broilers. Several recent studies have reported QTLs associated with chicken growth traits in this region [14].
In total, eleven putative sweeps including IGF1 had ZF ST -scores more extreme than that of yellow skin/BCO2 (Table 3), four of which were localized in regions lacking annotated genes. Other signals overlapped with HNF4G, TBXAS1, GLP1R and GJD2. A particularly interesting signal was found in the comparison of RJF/Commercials, and was localized at the distal end of GGA14 (ZF ST = 9.65) coinciding a gene-rich region. This signal was further supported by analysis of the differences in nucleotide diversity between wild and domestic chicken (ΔPi) that revealed a high degree of fixation in domestic chicken in this window on GGA14 (see section 'Analysis of fixation' and Fig 2B), therefore we decided to further evaluate this signal. A closer look at the GGA14 sweep ( Fig 2C) revealed three separate signals emerging from the region. The window with the strongest signal (ZF ST = 9.65) occurs in a window with a very high SNP density (nSNPs = 1,798) and overlaps the genes JPT2 (Jupiter microtubule associated homolog 2) and CRAMP1 (cramped chromatin regulator homolog 1). The signal reflects strong genetic differentiation between RJF and all domestic chickens (Fig 2). JPT2 (also known as HN1L) shows high sequence conservation among vertebrates and are proposed to be involved in embryo development [15]. A study in Drosophila melanogaster showed that a CRAMP1 homolog is involved in epigenetic regulation of gene expression [16].
We explored these genes for function-altering mutations and identified 6 highly differentiated SNPs (ΔAF � 0.7) between RJF and commercial populations, all residing in CCNF, annotated as missense mutations, (S10 Table), one of which was predicted to be deleterious.
b. Analysis of fixation. To extend the analysis of loci under selection during domestication, we compared the level of nucleotide diversity between wild birds and commercial lines. For this analysis, we included the six populations comprising sequence data from single individuals (see Table 1). We computed absolute values of the difference in nucleotide diversity (ΔPi) between groups of wild vs. commercial birds (RJFs vs. Coms) in every window and normalized the results by using a Z-score normalization (ZΔPi = (ΔPi win −ΔPi genome )/σ(ΔPi genome )). The most outstanding signal of Z|ΔPi| occurs on GGA14 overlapping the sweep signal encompassing the JPT2 and CRAMP1L genes (Fig 2).
In a further step we estimated nucleotide diversity for groups of birds as well as all six populations of RJFs and commercials (S5A and S5B Fig). The latter scan may identify adaptive selection that happened prior to domestication in those cases where there is no significant genetic differentiation between populations but a reduction in nucleotide diversity in all of them. Density plots indicate no outlying signal at the negative tail of the diversity distributions implying the absence of aberrant local diversity across the genome, an observation that emerges from genomic distribution of diversity scores as well (see S5C Fig). At the local scale however, we noticed extensively swept regions that persisted across multiple consecutive windows and span over hundreds of kilobases (Fig 3). Two particularly interesting selective sweeps that are present in all populations overlap the genes for ALX Homeobox 1 (ALX1) and KIT Ligand (KITLG) on GGA1. The reason for classifying these as two separate sweeps is that they are separated by a highly variable region. The ALX1 is responsible for beak shape variation among Darwin's finches [17]. The KITLG is a major determinant of pigmentation, which plays an important role in camouflage and sexual display [18]. As shown in Fig 3A, this is a fairly large region with an unusually low nucleotide diversity and we cannot rule out the possible involvement of other genes residing in the region contributing to the observed pattern. However, the two emerging valleys of homozygosity are evidently centred over ALX1 and KITLG. The results suggest that beak morphology and pigmentation traits may have been under selection in chicken prior to domestication. Another noticeable sweep is located on GGA2 spanning over a~3.5 Mb region harbouring 25 genes (Fig 3B).

Genomic enrichment of functional variants
The extensive SNP data combined with annotation information for each single site enabled us to explore the genomic distribution of sequence polymorphisms showing strong genetic differentiation between wild and domestic chicken as well as between broilers and layers. We carried out enrichment analyses to identify categories of SNPs showing differentiation between groups of birds. The absolute allele frequency difference (ΔAF) was calculated for different categories of SNPs in four contrasts (1) RJFs vs. Coms, (2) BRs vs. LRs, (3) RJFs vs. BRs and (4) RJFs vs. LRs and these ΔAF-values were sorted into 10 bins of allele frequency (ΔAF 0-0.1, etc.) to test for possible enrichment of variants in different annotation categories among SNPs showing strong differentiation. In all contrasts the great majority of SNPs showed a ΔAF<0.10 (Figs 4 and S6, S11-S14 Tables). This implies lack of differentiation between groups of birds at most loci, whereas a small percentage of variants, including those under selection showed highly significant differentiation.
The intensity of adaptive and purifying selection varies across the genome according to the functional properties; as such intergenic sequences evolve relatively more freely than proteincoding sequences. We observed a marked decline in relative abundance of missense substitutions showing a steady decrease above ΔAF = 0.2 in all contrasts (Fig 4). SNPs with marked allele frequency differences (ΔAF�0.7) between wild and commercial chicken demonstrate a highly significant deficiency of missense mutations (P<2.5×10 −6 ). We argue that this sharp decline in the proportion of differentiated missense substitutions represents purifying selection that reduces the frequency of slightly deleterious mutations affecting production and/or health. Thus, SNPs showing strong genetic differentiation between wild and domestic chickens are enriched for selectively neutral variants that have changed in frequency due to genetic drift as indicated by the enrichment of intergenic SNPs (P<0.0001) among variants with ΔAF>0.7. This result is in sharp contrast to recently reported data for the Atlantic herring where a similar analysis of high ΔAF SNPs showed a highly significant enrichment of missense mutations and other functionally important variants in a species with huge population size and a minimum amount of genetic drift [19].
The increase of log2 values for the contrast RJF vs. Coms and the flat curve for BRs vs. LRs (Fig 4) indicate most likely that a fraction of the missense mutations has been under positive selection during domestication. Therefore, we decided to focus on the highly differentiated missense variants (e.g., ΔAF>0.70), which were only 262 and 188 in the contrast 'RJFs vs. Coms' and 'BRs vs. LRs', respectively. All strongly differentiated missense variants in all four contrasts are compiled in S15-S18 Tables. Within the list of high ΔAF SNPs we observed multiple missense variants occurring in the same gene. For example, the 262 missense substitutions with ΔAF � 0.70 in the RJFs vs. Coms contrast occur in only 189 different genes and the corresponding figure for the contrast BRs vs. LRs is 188 missense substitutions in 150 genes. This result may reflect hitchhiking or possibly the evolution of alleles composed of multiple causal variants affecting the function of the same gene as previously documented in domestic animals [20].
We used the hypergeometric test of FUNC [21] to perform a gene ontology enrichment analysis based on the list of all genes embedding differentiated missense mutations and found no significant overrepresentation of any particular biological process. Nevertheless, we noted that some of these variants occur in genes affecting domestication or production-related traits (Table 4). However, as most genes have pleiotropic effects, selection may possibly act on other functional effects of these genes than those highlighted here. In the following sections, we highlight some results from these analyses. Purifying selection revealed in commercial chicken Evolution of pigmentation traits from wild to domestic type is one of the most striking changes during domestication [20]. Traits associated with visual appearance in domestic chicken have been artificially selected for aesthetic reasons and as a trademark in establishing distinct breeds. In the enrichment analysis of 'RJFs vs. Coms', two of the missense mutations with the highest ΔAF occur in the CORIN (AF RJFs = 0.09 and AF Coms = 0.96) and in Ski2 Like RNA Helicase 2 (SKIV2L2, AF RJFs = 0.71 and AF Coms = 0.00) genes. CORIN is a modifier of Agouti signalling protein (ASIP) in dermal papilla and its absence causes ASIP activity being prolonged leading to lighter coat color in mice [22]. SKIV2L2 regulates melanoblast proliferation during early stages of melanocyte regeneration [23]. Thus, both genes are involved in the pigmentation process. However, no genotype-phenotype association has yet been established for CORIN and SKIV2L2 in chicken.
Among the top ΔAF alleles in the 'RJFs vs. Coms' contrast is the gene encoding sperm flagellar protein 2 (SPEF2, AF RJFs = 0.03 and AF Coms = 0.82). SPEF2 is implicated in feather development. In contrast to the modern chicken, jungle fowl use feathers for flight and thermoregulation, both of which are more crucial in wild birds than in commercial chicken maintained in a controlled environment. However, thermoinsulation must have been an important trait in domestic chicken in the past when birds were kept in unheated stables in cold climate. Furthermore, SPEF2 is a major candidate gene for chicken early-and late-feathering [24], which is an economically important trait in the poultry industry since it can be used to sex chickens, and likely another reason for the differentiation of this mutation through linked selection. Two other notable mutations in this contrast overlapped the GLI Family Zinc Finger 3 (GLI3, AF RJFs = 0.03 and AF Coms = 0.79) and the Kinesin Family Member 7 (KIF7, AF RJFs = 0.03 and AF Coms = 0.82) genes, both involved in Sonic hedgehog (Shh) signaling pathway that controls the normal shaping of many tissues and organs during embryogenesis including limb and wing development [25,26]. Further genetic and functional studies of these allelic variants are necessary to verify their possible contribution to chicken domestication.
Coding SNPs with ΔAF�0.7 in the contrast between BRs vs. LRs also included interesting candidate mutations. For example, a missense mutation of extreme ΔAF (AF BRs = 0.14 and AF LRs = 0.86), occur in the Leptin receptor gene (Table 4), whose function in regulating feed intake and body weight is well documented in mammals [27,28] whereas the role of leptinleptin receptor interaction for metabolic regulation in birds is not yet clear [29]. Another particularly interesting substitution in this contrast overlaps the multiple epidermal growth factor 10 gene (MEGF10, AF BRs = 0.82 and AF LRs = 0.00) on GGA8, known to function as a myogenic regulator of satellite cells in skeletal muscle [30]. Mutations in MEGF10 have previously been shown to cause an unusual combination of dystrophic and myopathic features leading to the weak muscles in humans [30,31], suggesting that the mutation reported here may affect muscle growth in broilers. The fact that different broiler lines have a high frequency of the variant allele at this locus is consistent with this suggestion. Other notable mutations in this contrast were found in the IGSF10 gene implicated in age at puberty [32] and PLEKHM1 with a suggested role in osteoporosis [33].
This paper reports the discovery and characterization of over 20 million SNPs from the chicken genome with the goal to delineate those with potential functional consequences-either having adaptive advantages or deleterious effects. To our knowledge, this is so far the largest study of its kind in chicken as a large number of individuals have been sequenced and a large number of sequence variants were detected. As many as 34% (n = 7,146,383) of the SNPs had not been reported before. The results revealed a subtle differentiation between wild and modern chicken at most loci, whereas a small percentage of loci showed strong differentiation. Analysis of selection provided a comprehensive list of several tens of independent loci that are likely to have contributed to domestication or improving production. We confirmed strong differentiation between red jungle fowl and domestic chickens at the previously reported BCO2 and TSHR loci. We identified 34 putative selective sweeps co-localized with, among others, KITLG, ALX1, IGF1, DLK1, JPT2 and CRAMP1. Single SNP contrasts between groups of birds revealed several highly differentiated coding variants, in genes such as CORIN and SKIV2L2 involved in pigmentation and LEPR, MEGF10 and SPEF2 possibly affecting traits relevant for animal production. SNPs with marked allele frequency differences between wild and domestic chicken showed a highly significant deficiency of the proportion of missense mutations (P<2.5×10 −6 ).

Genetic material
Three groups of birds were included in the study (1) red jungle fowls (Gallus gallus gallus, RJFs), (2) broilers (BRs) and (3) layers (LRs) ( Table 1). The RJFs were sampled from two geographical regions, Thailand (RJFt) and India (RJFi). The RJFt consisted of 25 DNA samples collected within a European collaborative research project AVIANDIV (https://aviandiv.fli.de/ ). RJFt was randomly down-sampled from~150 RJFs caught in northern Thailand in 1997 and maintained since with random mating over four flocks; given the place and date, the RJFt samples likely have seen some contamination from domestic or feral populations prior to collection [34]. The DNA samples from RJFt were collected in 1999. For further information on the behaviour and morphology of these birds we refer to the AVIANDIV project webpage. The RJFi population involved 10 individuals of the Richardson line, originating from RJF caught in India in the 1960´s. This population has been extensively studied [35][36][37], and appears to have been established from a wild population prior to major genetic contamination of red jungle fowl populations, such that it may represent a unique RJF line that is at least largely free of influence from domestic stocks. The second and third group of birds represent commercial chicken, comprising three broiler and three layer populations, respectively. The broilers (BRs) were represented by 20 DNA samples of each of two lines (BRA and BRB) established independently and previously collected as part of the AVIANDIV project. BRA was a sire line belonging to the company Indian River International (Texas) established in 1980 and closed since with a breeding population size of >10,000 birds. BRB was another sire line originally from France, developed in 1970 with a breeding population size varying between 10,000 to 70,000. The broiler group further involved a pooled sample of 25 birds from AVIANDIV's broiler sire line D, hereafter denoted BRpD. This is a sire line originally from UK, established in 1974 and closed since with unknown population size. In the layer group (LRs), data from 25 birds each from purebred white (WL) and brown (BL) egg laying populations, sequenced in the frame of the SYNBREED project (http://www.synbreed.tum.de/index.php?id=2), were included. WL and BL birds represent parental lines of the LOHMANN Tierzucht GmbH that are originally established from White Leghorn and Rhode Island Red, respectively. Moreover, we used pooled sequence data of 48 birds from Rhode Island White (RWp), a crossbred layer population collected by the AVIANDIV project.

DNA sequencing, alignment and variant calling
Sequencing libraries of 300-500 bp fragments were constructed for each individual sample using Illumina Nextera Library preparation kits. Sequencing of RJFt, BRA and BRB was conducted using an Illumina HiSeq 2500 machine and 2x126 bp paired-end reads were generated. RJFi, WL and BL along with the three DNA pools (RWp, BRpB and BRpD) were sequenced with 2x101 bp paired-end reads (see Table 1). All reads were mapped against the reference genome assembly Galgal5 [38] using the Burrows-Wheeler aligner (bwa-0.7.12) [39]. Duplicate reads were masked during pre-processing using the Picard tool set (version 2.0.1).
We identified SNPs following the recommendations of best practices workflow for variant discovery analysis using GATK [40]. Briefly, after recalibrating for base quality scores, BAM files were fed into the GATK-HaplotypeCaller tool which is capable of calling SNPs and INDELs simultaneously via local de-novo assembly of haplotypes in a region. After generating 127 GVCF files for individual and pooled samples, they were called simultaneously using the GenotypeGVCFs module. Raw vcf files were then filtered and used for downstream analyses. S1 Fig presents a summary of SNPs called based on different sequencing parameters.

Analysis of population structure and relatedness
The R packages SNPRelate and gdsfmt [41] were used for principal component analysis of relatedness using identity-by-descent measures estimated from all SNPs. SnpEff (v.3.4) [42] was used to annotate variants according to their functional categorization which included the following categories 5 kb up-and down-stream of a gene, intergenic, missense, synonymous, intronic, 3' untranslated regions, 5' untranslated regions, stop gain and stop loss. Variants in the up-and down-stream regions and in the 3' UTR, 5' UTR regions were merged into the single categories.

Enrichment analysis
The enrichment analysis was conducted as previously described in [44] for four contrasts (1) RJFs vs. commercial and (2) BRs vs. LRs, (3) RJFs vs. BRs and (4) RJFs vs. LRs. First we estimated the allele frequency (AF) of each SNP based on the proportion of high-quality reads supporting the non-reference allele. To ensure an unbiased estimation of AF several filters were employed to remove low quality SNPs and uncertain genotypes. In individually sequenced populations, loci with genotype quality < 20 were set to no.call and allele frequencies were estimated only for sites with >50% of the individuals genotyped. Because of low coverage, we treated the population RJFi as a pool in this analysis. In all pools SNPs with allelic depth <50% of mean coverage were set to no.call. Then, for each contrast, allele frequencies of intra-group populations were averaged and used to estimate the absolute value of allele frequency difference (ΔAF) for every single variant. The SNPs were then sorted into different bins of ΔAF (e.g., 0-0.1, >0.1-0.2, etc.) representing the allele frequency difference between populations. The expected number of SNPs for each category in each bin was calculated as p (category) X n(bin), where p(category) is the proportion of a specific SNP category in the entire genome and n(bin) is the total number of SNPs in a given bin. Finally, log2 fold changes of the observed SNP count for each category in each bin were compared against the expected SNP count and statistical significance of the deviations from the expected values was tested with a standard χ 2 test.

Detecting selective sweeps
Evidence of positive selection was investigated in two steps. First, we explored differentiation of loci between the following combinations of populations. (1) RJFs vs. Commercials, (2) BRs vs. LRs, (3) RJFs vs. BRs and (4) RJFs vs. LRs. We estimated F ST [45] for each of these four contrasts. To reduce locus-to-locus variation in the inference of selection we averaged single SNP values for sliding windows of 40 kb with 20 kb overlap across chicken chromosomes. Window-based F ST values were then normalized and windows in the outlier tail ZF ST > 6 were identified as selection candidates for domestication and genetic improvement in commercial populations.
In the second step, we searched the genome for regions with high degrees of fixation. To this purpose, the nucleotide diversity (Pi) was compared between RJF and commercial birds as a signature of selection during domestication. Different window sizes were tested but did not change the consistent picture of the signals. A window size of 40 kb was selected in accordance to the differentiation analysis. The Pi values were then normalized. Analysis of fixation involved six populations for which individually sequenced data were available. As such, nucleotide diversity was estimated for RJFs (two red jungle fowl populations), commercials (four commercial lines), broilers (the two commercial broiler lines, BRA and BRB), LRs (two layer populations, BL and WL) and ALL (all six populations of RJFs and commercials).
Gene ontology enrichment analyses, contrasting differentiated genes against a genomic background gene set, were performed using the hypergeometric test of FUNC [21].
Supporting information S1 (DOCX) S11 Table. Distribution of SNPs with functional annotation in different delta allele frequency bins between two wild and six commercial populations ('RJFs vs. Commercials'). (DOCX) S12 Table. Distribution of SNPs with functional annotation in the delta allele frequency bins between three broiler and three layer populations (BRs vs. LRs). (DOCX) S13 Table. Distribution of SNPs with functional annotation in different delta allele frequency bins between two wild and three layer populations (RJFs vs. LRs). (DOCX) S14 Table. Distribution of SNPs with functional annotation in different delta allele frequency bins between two wild and four broiler populations (RJFs vs. BRs). (DOCX) S15