Runs of Homozygosity in Modern Chicken Revealed by Sequence Data

Runs of homozygosity (ROH) are chromosomal stretches that in a diploid genome appear in a homozygous state and display identical alleles at multiple contiguous loci. This study aimed to systematically compare the genomic distribution of the ROH islands among five populations of wild vs. commercial chickens of both layer and broiler type. To this end, we analyzed whole genome sequences of 115 birds including white layer (WL, n = 25), brown layer (BL, n = 25), broiler line A (BRA, n = 20), broiler line B (BRB, n = 20) and Red Junglefowl (RJF, n = 25). The ROH segments varied in size markedly among populations, ranging from 0.3 to 21.83 Mb reflecting their past genealogy. White layers contained the largest portion of the genome in homozygous state with an average ROH length of 432.1 Mb (±18.7) per bird, despite carrying it in short segments (0.3-1 Mb). Population-wise inbreeding measures based on Wright’s (Fis) and genomic (FROH) metrics revealed highly inbred genome of layer lines relative to the broilers and Red Junglefowl. We further revealed the ROH islands, among commercial lines overlapped with QTL related to limb development (GREM1, MEOX2), body weight (Meis2a.1, uc_338), eggshell color (GLCCI1, ICA1, UMAD1), antibody response to Newcastle virus (ROBO2), and feather pecking. Comparison of ROH landscape in sequencing resolution demonstrated that a sizable portion of genome of commercial lines segregates in homozygote state, reflecting many generations of assortative mating and intensive selection in their recent history. In contrary, wild birds carry shorter ROH segments, likely suggestive of older evolutionary events.

ones (background relatedness), and the sum of all these segments are suggested to be an accurate estimation of the inbreeding level of an individual (Ceballos et al. 2018a,b;Yoshida et al. 2020).
Chicken is a vital livestock for the world food security by producing massive quantities of meat and egg. Chicken also provide an excellent model to investigate the genetics of adaptation, as (1) it involves transformation of the ancestral Red Junglefowl (RJF, Gallus gallus), that still runs wild in most of Southeast Asia into a domestic bird.
(2) Domestic chicken have experienced intensive selection over the last decades and several breeding companies have independently bred primary multipurpose populations into highly productive birds, so-called broilers (meat-type) and layers (egg-type), by selecting for very similar breeding goals (Elferink et al. 2012).
In this study we aimed to systematically compare the genome of domestic birds with each other and against their wild ancestor. The main hypotheses here were to characterize genomic distribution of ROH islands and to localize homozygous segment common among all birds and between broilers vs. layers, respectively suggestive of ancient adaptation or footprints of domestication. Our results provide the landscape of homozygosity in sequence resolution, highlighting several candidate genes co-localized with quantitative trait loci (QTL). However, we were unable to pinpoint the standalone ROH islands overlapping among groups of birds as a significant fraction of the genome in modern chicken was carried in homozygous state.

Chicken populations and data preparation
For the purpose of this study we used SNP panel from birds sequenced individually in Qanbari et al. (2019). Briefly, it comprises medium coverage (10 folds) sequence of the entire genome of 115 chicken samples including white layer (WL, n = 25), brown layer (BL, n = 25), broiler line A (BRA, n = 20), broiler line B (BRB,n = 20) and Red Junglefowl (RJF, n = 25). For further detail of the SNP calling process we refer to the original study. Markers with minimum 1 minor allele were retained for further analysis. To ensure data quality we excluded the SNPs with genotyping rate , 0.1 and significant deviations from the Hardy-Weinberg equilibrium (P HWE ,10e-6).

Measuring diversity metrics
The diversity indicators including observed heterozygosity (Ho), and expected heterozygosity (He), were calculated using PLINK v1.9 package (Purcell et al. 2007). Polymorphic marker ratio (PN) that refers to the proportion of polymorphic loci in the target population was also estimated by averaging the proportion of non-missing SNPs per individual for each population.
Measuring runs of homozygosity Analysis of ROH was performed using PLINK v1.9 package (Purcell et al. 2007). The -homozyg module makes ROH calls using a predefined sliding window that scans along an individual's SNP panel to detect homozygous stretches (Howrigan et al. 2011). The parameters and thresholds to define an ROH were set according to Almeida et al. n■   (2019) and Ceballos et al. (2018b). Accordingly, we applied (i) sliding windows of size 50 SNPs across the genome; (ii) the proportion of homozygous overlapping windows set to 0.05; (iii) the minimum number of consecutive SNPs included in a ROH set to 50; (iv) the minimum length of a ROH set to 300 kb; (v) the maximum gap size between consecutive homozygous SNPs set to 1000 kb; (vi) required minimum density to consider a ROH was 1 SNP in 50 Kb; and (vii) a maximum of five SNPs with missing genotypes and up to three heterozygous.

Measuring inbreeding
Two measures of inbreeding coefficient were calculated with PLINK v1.9 (Purcell et al. 2007) for each population.
1. 1. Wright's inbreeding coefficient (F is ) was calculated by comparing the difference between observed and expected autosomal homozygous genotypes for each sample as follows: F is 5 Number of observed homozygous loci 2 Number of expected homozygous loci Number of nonmissing loci 2 Number of expected homozygous loci 2. Genomic inbreeding coefficients based on ROH (F ROH ) was estimated for each bird according to McQuillan et al. (2008). Accordingly, F ROH was defined as: where L ROH is the total size of ROH in the genome of each bird. L total is the total size of autosomal chromosomes of chicken covered by SNPs of an individual. The size of autosomal genome was considered as 931 Mb according to the chicken reference assembly Gallus_gallus-5.0.86 available at UCSC Genome Browser. The correlation between the F ROH and F is was calculated for all homozygous stretches and for the five chicken populations. All plots were generated with the R, package ggplot2 for R v3.6.2.

Distribution of runs of homozygosity
The total number of ROH per chromosome, average length of ROH per chromosome (Mb), and the percentage of chromosomes covered by ROH were estimated in PLINK (-homozyg option). Using an in-house R script the identified ROHs were classified into seven length classes of 0.3-1, 1-2, 2-4, 4-8, 8-10, 10-16 and .16 Mb, respectively.

Detection of autozygosity islands
To investigate genomic regions that were associated with occurrences of ROH within each breed, the fraction of SNPs in ROH was estimated based on the frequency of a SNP in them across individuals. 'ROH islands' was identified as a region of adjacent SNPs with an ROH frequency per SNP above the threshold of 1%. These regions were then overlapped with Chicken quantitative trait loci (QTL) database (release 40, 2019, https://www.animalgenome.org/cgi-bin/ QTLdb/GG/index). (Cingolani et al. 2012) was used to predict the functional effects of the identified mutations based on reference genome n■  Gene-ontology (GO) enrichment analysis Gene-ontology (GO) enrichment experiment was conducted using the ClueGO plugin v2.1.7 (http://www.ici.upmc.fr/cluego/) (Bindea et al. 2009). We further used DAVID v6.8 tool (Huang et al. 2009a;Huang et al. 2009b) to focus more on significantly enriched Gene GO terms. Only GO/pathway terms with significant enrichment (corrected P-value , 0.05 of Benjamini-Hochberg) were included in the analysis.

Data availability
Data used in this study has been originally published by Qanbari et al. (2019) and is deposited in the European Nucleotide Archive (ENA) with the accession number: PRJEB30270. Supplemental material available at figshare: https://doi.org/10.25387/g3.13107176.

RESULTS AND DISCUSSION
Genotyping, quality control and genetic diversity The total genotyping rate ranged from 0.97 to 0.99 and the number of polymorphic sites within breeds ranged from 6'100'640 to 14'734'764 (Table 1). RJF was the most diverse population with over 14 million polymorphic sites retained for the final analyses (Table 1). For each breed, genetic diversity was measured using heterozygosity and Wright's inbreeding coefficient (F is ). RJF breed, revealed the highest polymorphic marker ratio (99.85%) and despite the lowest observed (H O = 0.240 6 0.032) and expected heterozygosity (H E = 0.249 6 0.000) we measured a low level of inbreeding (F is = 0.037) ( Table 1). On the contrary, layers represent lowest polymorphic marker ratio and less observed heterozygosity than expected which is attributed to the small number of founders and many generations of mating within closed lines of limited population size, but also partly due to the effect of linked selection (Qanbari et al. 2019). Accordingly, F is was markedly high in layers in line with a previous report of genetic diversity and inbreeding in commercial white egg layer line (Bortoluzzi et al. 2018). A recent study based on genotypes of a high-density SNP array revealed lower inbreeding (F is = 0.037 to 0.237) in Chinese indigenous chicken breeds than both Europeans local (F is = 0.254 to 0.404) and commercial brown egg layers (F is = 0.144), suggestive of the richer genetic diversity available in indigenous populations (Chen et al. 2019).

Runs of homozygosity
As schematically shown in Figure 1, distinctive clustering patterns reflected the genetic diversity of five chicken populations. The ROH n■  profile (clusters, Figure 1) in wild birds displayed a more variable pattern than domestic birds (Figure 1). Commercial lines exhibited similar average cumulative size as well as average ROH number, indicating extremely homogenous genomes with low genetic diversity (Table 1). Descriptive summary of ROH number and length classes is presented in Table 2. As expected, White layers carry more ROH with an average number of 512.28 6 18.76 per bird (ranging from 547 to 471). By contrast, RJFs encompass least ROH average in 84.00 6 61.02 (ranging from 251 to 6) per bird (see Table 2 and File S1). The average size of ROH tracts observed in Red Junglefowls is comparable to the African indigenous chicken populations from Rwanda and Uganda both genotyped with the chicken Affymetrix 600K Axiom Array (Elbeltagy et al. 2019). Our measure of ROH in broiler populations confirms the dimension reported in previous studies (Marchesi et al. 2018;Almeida et al. 2019 Table 2). The largest proportion of the long tracts (ROH .10Mb ) was found in BL, reflecting recent inbreeding (Table 2). RJF also revealed several long segments of ROH (Table 2)  The effect of chicken genome heterogeneity in forming ROH segments was also investigated through comparison of ROH on macro-(GGA1 to 5), intermediate-(GGA6 to 10) and microchromosomes (GGA11 to 28) (Figure 2 and Supplementary Table S1). The GGA16 chromosome did not present any ROH segment due to the assembly fallacy. We found the longest ROH segments on macrochromosomes in line with the lower rate of nucleotide diversity and recombination rate (Axelsson et al. 2005;Groenen et al. 2009). Overall, macrochromosomes showed the highest number of common ROH and the longest ROH tracts per individual (Supplementary Table S1). The highest percentage of genome covered by ROH in layers was observed in micorchromosomes e.g., in GGA11 (58.21 in WL) and GGA22 (50.12 in BL), (see Figure 2 and Supplementary Table S1), whereas the corresponding chromosomes in broilers and Red Junglefowl were e.g., GGA2 (26.25 in BRB) and GGA5 (18.89 in RJF) (see Figure 2 and Supplementary Table S1).

Genomic inbreeding coefficient estimated From ROH (F ROH )
In the absence of pedigree information, numerous studies have documented the usefulness of ROH segments to infer the inbreeding level of an individual (e.g., Purfield et al. 2017;Marchesi et al. 2018;Strillacci et al. 2018;Zhang et al. 2018, among others). We calculated the genomic inbreeding coefficient (F ROH ) corresponding to the different length classes of ROH segments (Table 3). Consistent with Wright's inbreeding coefficient, WL was the most inbred population with an F ROH = 0.46 6 0.02 (see Table 3). The genomic inbreeding coefficient ranked in the following order WL . BL . BRB . BRA . RJF for the populations (see Table 3 and File S1), which is consistent with the Wright's definition of inbreeding as well as the diversity measures in these populations (e.g., Qanbari et al. 2019). Layer lines exhibited markedly greater level of inbreeding in comparison to the broilers and wild birds (Table 3). However, extent of ROH tracts observed in WL and BL is to some extent contradictory. While, a higher proportion of short ROH (F ROH 0.324Mb ) identified in WL   ( Table 2 and 3), BL carry a higher proportion of long tracts (F ROH .16Mb ), that might misinterpreted as an ancient origin of inbreeding in layers than broilers. In fact, given the recent genealogy of close mating, layers are expected to carry long homozygosity tracts, as is reported in the previous studies based on array genotypes (Bortoluzzi et al. 2018). However, long ROHs tend to break into shorter tracts due to the presence of frequent heterozygous gaps in sequencing resolution (e.g., Qanbari 2020; Szmatola et al. 2020), whereas microarrays would likely miss them. Figure 3 presents a schematic visualization of Pearson correlation between two measures of molecular inbreeding (F is and F ROH ). A moderate to strong correlation was found between F is and F ROH within all populations, demonstrating that the extent of a genome under ROH can be used fairly accurately to predict the IBD fraction. The correlation between F is and F ROH was lowest in WL (r = 0.64, P-value = 0.0029) and BL (r = 0.79, P-value , 0.0001), while the highest correlation was observed in RJFs (r = 0.99, P-value , 0.0001). This trend indicates a positive association of the short ROHs with F is and likely suggestive of ancestral inbreeding in wild birds. Studies have reported high correlation between F is and F ROH based on short ROH segments in cattle, suggesting ancestral inbreeding back up to 50 generations ago (Feren caković et al. 2013;Mastrangelo et al. 2016;Limper 2018). The high correlation observed between two metrics of molecular inbreeding is consistent with the previous reports on chicken (Bortoluzzi et al. 2018;Marchesi et al. 2018;Almeida et al. 2019), cattle (Feren caković et al. 2013Zhang et al. 2015;Mastrangelo et al. 2016;Peripolli et al. 2018) and sheep (Mastrangelo et al. 2017;Purfield et al. 2017;Mastrangelo et al. 2018) as well as human model (Clark et al. 2019). These results confirm the usefulness of the ROH analysis in monitoring differentiation and inbreeding values for further exploitation in chicken breeding programs in the absence of pedigree records.
ROH islands indicative of selection sweeps ROH islands might be indicative of genomic regions underwent natural and/ or artificial selection. We sought to identify most homozygote variants within ROH islands as candidates of recent adaptation and focused on outlaying SNPs in top 1% of ROHs in each population. Given the variable polymorphism content, homozygosity threshold to call an ROH island were population-specific. For example, 96% of the white layers were homozygote in top 1% of the ROH islands, whereas the corresponding figure in wild birds was only 40%. Accordingly, we set the SNP homozygosity thresholds as . 96%, 96%, 60%, 65%, and 40% in WL, BL, BRA, BRB, and RJF, respectively ( Figure 4).
We identified 26 to 58 regions of the genome with a high frequency of ROH occurrence, also known as ROH islands, within individuals of particular breeds. ROH islands were detected more frequent in WL (n = 58) and the less in RJF (n = 26). The ROH islands had the average length in range of 223.5 kb 6 177.2 (WL) to 503.8 kb 6 416.6 (BRA), and average number of consecutive SNPs in range 478.3 6 742.0 (WL) to 4947.3 6 7220.3 (RJF). The average length of the ROH islands calculated for all breeds was 405.9 kb 6 103.2, while the average number of consecutive SNPs per region was 2108.6 6 1576.4. Therefore, ROH islands tended to break into shorter segments (see WL in File S2), due to the presence of short heterozygous gaps within the ROH sequences (Nandolo et al. 2018). A summary statistics is presented in details in File S2 and shown schematically in Figure 4.
We found 18 ROH islands uniquely detected in wild birds (see Table 4 and File S2). Among the ROH islands, one and seven overlapping regions were detected respectively in layers and broilers out of which three regions in broilers reported to be in association with traits of economic interest available at the Chicken QTL database (release 40). These regions include genes related to antibody response to Newcastle virus (ROBO2), eggshell color (GLCCI1, ICA1, UMAD1), antibody titer to SRBC antigen, body weight at 56 days (Meis2a.1, uc_338) and feather pecking (Table 4). In the same way, the ROH islands detected in RJFs overlapped 53 QTL, among them are genes associated with eggshell strength and eggshell thickness ( Table 4). The localized panels of ROHs were further compared with the putative selection signatures reported in Qanbari et al. (2019). Our comparison revealed overlap in only two ROH islands located on GGA2 and 5 (see Table 4). The genes located within ROH islands and detected in multiple populations have been presented in File S3.
We further conducted an enrichment test on the gene list located in ROH islands to identify over-represented Gene Ontology (GO) terms (Table 5). In commercial lines, genes located in ROH islands were associated with biological functions related to chicken domestication and evolution such as limb development (GREM1, MEOX2) and negative regulation of apoptotic process (AVEN, ASNS, GREM1) ( Table 5). In contrast, the most significant GO term in Red Junglefowl was related to the reproductive traits such as oogenesis (PTN, WASH1, WEE2). These findings show candidate pathways associated with economically important traits and chicken genetic diversity during domestication and recent improvement.

CONCLUSION
This study provided the first systematic comparison of runs homozygosity islands between wild and domestic birds in sequence resolution. We found larger fraction of layers genome segregating in homozygote state, reflecting the recent inbreeding in commercial lines, although some of the long ROH tend to break into smaller tracts. Compared to the homogenous values reported for the commercial lines, wild birds showed important variation in the total length of ROH. Regions with a high frequency of ROH occurrence n■ within domestic birds were co-localized with genes implicated in biological functions related to chicken domestication such as a limb development (GREM1, MEOX2), whereas in Red Junglefowl these regions overlapped with genes related to oogenesis. We also found a modest to high correlation between two molecular measurements, substantiating a highly inbred nature of domestic birds relative to their wild ancestors.

ACKNOWLEDGMENTS
RT and SQ conceived and designed the study. RT performed the bioinformatics and analyzed the data. SQ provided the R programming codes. RT and SQ wrote the manuscript. TS, SQ and GM edited the manuscript. All authors approved the manuscript before submission.