GBS Data Identify Pigmentation-Specific Genes of Potential Role in Skin-Photosensitization in Two Tunisian Sheep Breeds

Simple Summary Maintaining the flexibility of the genetic resources of native animals to face local environment constraints is still a major challenge. In Tunisia, the Noire de Thibar breed is a local sheep, typically with black coloration, known for its ability to tolerate “hypericum perforatum”, which causes skin photosensitization in white colored sheep. The goal of this study was to perform a genome scan, by considering genotyping-by-sequencing (GBS) markers that were genotyped in divergent coat colored sheep (black vs. white) to identify strong, and recent, artificial selection that is involved in skin-photosensitization. Interestingly, the genomic differentiation analysis identified FST markers within genomic regions containing key pigmentation and photosensitivity related-genes. These findings help in understanding the background of coat color genetics and its potential role in adaptation to local environment constraints. Abstract The Tunisian Noire de Thibar sheep breed is a composite breed, recently selected to create animals that are uniformly black in order to avoid skin photosensitization after the ingestion of toxic “hypericum perforatum” weeds, which causes a major economic loss to sheep farmers. We assessed genetic differentiation and estimated marker FST using genotyping-by-sequencing (GBS) data in black (Noire de Thibar) and related white-coated (Queue fine de l’ouest) sheep breeds to identify signals of artificial selection. The results revealed the selection signatures within candidate genes related to coat color, which are assumed to be indirectly involved in the mechanism of photosensitization in sheep. The identified genes could provide important information for molecular breeding.


Introduction
Patterns of genetic variation processes in domestic animals have long proven insightful for the study of domestication, breed formation, population structure, and the consequences of selection [1]. Natural and artificial selection processes in domestic animals play crucial roles in maintaining the flexibility of animal genetic resources in facing local environment constraints. In livestock ruminants, skin photosensitization, caused by the ingestion of toxic plants, is relatively common and affects animal production. In the North of Tunisia, sheep farmers face major economic losses from the Animals 2020, 10, 5 2 of 10 intoxication of white sheep following the ingestion of 'hypericum perforatum' or "St. John's wort" weeds, with animals frequently showing photosensitivity symptoms [2]. Indeed, this plant contains hypercin that causes a photosensitive anaphylactic reaction in animals with white-colored skin resulting in high mortality [3][4][5]. Therefore, a local sheep breed called 'Noire de Thibar' or 'Black Thibar' has been specifically created to tolerate skin photosensitivity through a crossbreeding program between local white-coated Queue fine de l'ouest and black French Merinos d'Arles breeds to create animals that are uniformly black [6][7][8]. Furthermore, a new gene pool was introduced into the breed using the Black Brown Swiss breed in order to fix the black coat color, to avoid consanguinity, and to improve meat and wool quality [6,9,10]. In addition, the uniform black wool color is in high demand for traditional carpet and clothing manufacturers. Many studies have reported the role of melanocytes that specialize in producing melanin pigments in protecting organisms from ultraviolet radiation [11,12]. In humans, several genomic association studies have evidenced the effect of pigmentation-related-genes on photosensitivity symptoms and skin pigment disorders [13,14]. The advent of high throughput and cost-effective genotyping techniques allows for the evaluation of the response, at the genome level, to various selective pressures in a wide range of adaptive traits. For example, genotyping-by-sequencing (GBS) on livestock has been a sufficient solution that is required for population genetic and genomic studies [15]. This method has been successfully used to identify selection signatures of important economic traits in cattle [16], pigs [17,18], chicken [19], and camels [20]. The goal of this investigation was to perform a pair-wise marker differentiation comparison in a genome-wide scan between Noire de Thibar and Queue fine de l'ouest Tunisian sheep breeds, using genotyping-by-sequencing data to reveal putative regions under strong and recent selection.

Ethics Statement
The blood used for all of the analyses was collected by veterinarians during routine blood sampling ofcommercial farm animals (for medical care or follow up). These animals were not linked to any experimental trials. All the samples and data processed in our study were obtained with the breeders and breeding organizations' consent.

Sheep Breeds
In the present study, 61 sheep samples belonging to 38 black Noire de Thibar (described in the introduction section; Figure 1A) and 23 Queue fine de l'ouest samples ( Figure 1B) were genotyped for further analysis. In fact, the thin-tailed Queue fine de l'ouest is a white-coated, sheep meat breed found mostly in West-central Tunisia and originated from the Algerian Ouled Djellal breed. The samples were collected from animals without any specified diseases by veterinary clinics, but we assumed that white individuals were susceptible, while the black ones were assumed to be photosensitization-tolerant, as suggested by previous reports about the Tunisian Noire de Thibar sheep [6,9,10]. assignment of individuals to groups based on their genetic similarities. Therefore, based on the results of the ADMIXTURE cross-validation analysis, the most likely value, K, was 2 ( Figure 1D), suggesting that two breeds clustered consistently with MDS results, showing different ancestral genetic backgrounds, with the exception of some admixed animals probably from the Queue fine de l'ouest breed ( Figure 1E).

Detection of Fst Outlier Loci
The results revealed 40 potential SNPs distributed on 17 differentiated genomic regions with FST values ranging from 0.45 to 0.97 ( Figure 2; Table S2; Table 1). Table S2alsolists the details of the potential SNP markers and genes found within putative genomic regions. The results identify genomic regions on chromosomes 3, 6, 14, and 20 (Table 1), where the related significance markers were positioned no further than 200 Kb apart (Table S2). Therefore, the length of the genomic regions under selection ranged from 400 Kb (within one significant SNP) to ~610 Kb in region 3 on chromosome (OAR) 20 (Table1).The genome-wide distribution of FST values, represented in Figure 2, shows that the highest signal of selection was located in region 1, containing eight potential SNPs on chromosome 14 (Average FST = 0.67) where the most significant locus (chr14:14231897; FST = 0.97) was positioned at position 14.23 Mb, overlapping with the MC1R gene. In addition, the linkage disequilibrium plot (Figure 3) of eight significant SNP s most likely tracked the same quantitative trait loci (QTL), which is the MC1Rgene in the high LD block with (chr 14: 14230826, 14231897, and 14232016) markers. The second significant signal was located on OAR 6, containing five consecutive significance markers with peak SNP (chr6:70008012; FST = 0.87) spanning 487.267 Kb and harboring five potential genes-among them, the PDGFRA and KIT genes. The third genomic region, located on OAR 20, represented the broadest genomic region with 10 consecutive SNPs spanning 610,843 Kb and harboring eight genes at approximately 60 Kb upstream EXOC2 and IRF4 genes.

Genotyping-by-Sequencing, SNP Calling and Filtering
Genomic DNA was isolated from blood samples using the standard phenol/chloroform protocol [21]. The A260/280 ratios of DNA samples were determined with NanoDrop 2000 (Thermo Scientific, Weltham, MA, USA). The samples with an A260/280 ratio between 1.7 and 2.0 were genotyped using the genotyping-by-sequencing (GBS) protocol at the AgResearch Invermay Research Centre, Invermay, New Zealand. GBS library preparation and purification was performed, as described in Reference [22], using PstI/MspI restriction enzymes, following the method outlined by Elshire et al. [23], and sequencing was performed on an Illumina HiSeq2500 utilizing v4 chemistry (Illumina, San Diego, CA, USA) and 100 bp single end reads.
Raw fastq files were quality checked using FastQCv0.10.1 (http://www.bioinformatics.babraham.ac. uk/projects/fastqc/). SNPs were detected using default UNEAK (Universal Network-Enabled Analysis Kit) (Cornell University, New York, NY, USA) [24] pipeline parameters implemented in TASSEL 3.0 software. Briefly, good reads were defined as reads carrying a perfect barcode match with no Ns in the 64-bp following the barcode. Reads were subsequently trimmed to 64-bp (excluding barcodes) are collapsed into TAGs (identical 64-bp reads). Unique 64-bp, which were present five or more times across all samples, were retained and used to identify "TAG pairs" (pairs having a single base pair mismatch) with a default error tolerance rate (ETR) of 0.03, as described in Reference [24]. Reciprocal "TAG pairs" with only 1-bp mismatch were considered putative SNPs. The reliability and consistency of the genotyping data were evaluated using a quality control analysis pipeline: "Deconvolute and quality control" (https://github.com/AgResearch/DECONVQC). Tags were mapped onto sheep reference genome Ovis_aries. Oar_v3.1 using a bowtie2.2.3.0 sequence aligner using standard settings in order to generate whole genome SNP data. Quality filters were applied to discard low-quality markers, and thus, all SNPs were additionally filtered to remove those with minor allele frequencies (MAFs) lower than 0.05, SNP call rates lower than 90%, and deviation from the Hardy-Weinberg equilibrium with p < 0.001. Only SNPs located on autosomes were considered in further analyses. Moreover, individuals with >10% of missing genotypes were also removed.

Population Stratification
Pair-wise IBS allele sharing estimates among sheep samples were calculated using PLINK 1.7 [25] and were graphically represented by an MDS plot. The admixture level of each animal in two sheep Animals 2020, 10, 5 4 of 10 populations was estimated using a model-based clustering algorithm implemented in the software ADMIXTURE v1.2.3 [26], assuming two parental populations (K = 2) after calculating the most appropriate K value using the admixture's cross validation error in different K values ranging from two to five ( Figure 1D).The unsupervised method applied a maximum likelihood-based clustering algorithm that placed individuals into two predefined clusters without prior population information.

Detection of Selection Signals
After the sheep population differentiation was estimated (see results), we considered two breeds as case/control groups to obtain the Wright's F ST estimate for each SNP, via Weir and Cockerham's method [27] using the-fst option in PLINK1.9 [25,28]. In order to detect superior signals based on the allele frequency difference, markers were ranked according to raw F ST values and plotted according to their genomic position based on the Ovis_ariesOar_v3.1 assembly genome [29] using R software (version 2.13.2).
The top 0.1% ranked SNPs (n = 40) were considered as putative markers. SNP-containing or the nearest annotated genes for each potential marker were considered as putative genomic regions under selection. Annotated genes were obtained from the NCBI sheep genome data viewer version 4.8.3 (https://www.ncbi.nlm.nih.gov/genome/gdv/browser/gene/?id=101104604). Gene functions were determined using NCBI (http://www.ncbi.nlm.nih.gov/gene/) and were supported by an extensive literature search. Genes located at less than 200kb away from the potential SNPs were considered candidate genes. Moreover, the linkage disequilibrium (LD) structure and haplotypes blocks were defined using HAPLOVIEW v4.2 [30].

Population Differentiation
After DNA sequencing, SNP discovery yielded~116,000 sequence tags with a mean depth of 3.33. Of these, 100,333 SNPs were then successfully mapped to the genome and 39,788 GBS markers, genotyped in 61 sheep samples, were retained after data pruning and were used for further analysis. The characteristics of filtered SNPs were mentioned in Table S1. When considering the average call rate for the filtered SNPs set, 97% of markers had assigned genotypes with an average minor allele frequency (MAF) of 0.23 and an average observed heterozygosity 0.25.Population structure was assessed using 38 black-coated Noire de Thibar ( Figure 1A) and 23 white-coated Queue fine de l'ouest samples ( Figure 1B).The results of the population structure, provided by MDS ( Figure 1C), showed two clearly differentiated groups. In fact, Noire de Thibar animals were grouped in one tight cluster and were isolated from Queue fine de l'ouest samples. The ADMIXTURE method allowed for the assignment of individuals to groups based on their genetic similarities. Therefore, based on the results of the ADMIXTURE cross-validation analysis, the most likely value, K, was 2 ( Figure 1D), suggesting that two breeds clustered consistently with MDS results, showing different ancestral genetic backgrounds, with the exception of some probably admixed Queue fine de l'ouest animals ( Figure 1E).

Detection of F ST Outlier Loci
The results revealed 40 potential SNPs distributed on 17 differentiated genomic regions with F ST values ranging from 0.45 to 0.97 ( Figure 2; Table S2; Table 1). Table S2 also lists the details of the potential SNP markers and genes found within putative genomic regions. The results identify genomic regions on chromosomes 3, 6, 14, and 20 (Table 1), where the related significance markers were positioned no further than 200 Kb apart (Table S2). Therefore, the length of the genomic regions under selection ranged from 400 Kb (within one significant SNP) to~610 Kb in region 3 on chromosome (OAR) 20 (Table 1) (Figure 3) of eight significant SNP s most likely tracked the same quantitative trait loci (QTL), which is the MC1R gene in the high LD block with (chr 14: 14230826, 14231897, and 14232016) markers. The second significant signal was located on OAR 6, containing five consecutive significance markers with peak SNP (chr6:70008012; F ST = 0.87) spanning 487.267 Kb and harboring five potential genes, among them, the PDGFRA and KIT genes. The third genomic region, located on OAR 20, represented the broadest genomic region with 10 consecutive SNPs spanning 610,843 Kb and harboring eight genes at approximately 60 Kb upstream EXOC2 and IRF4 genes.

High Signals of Selection in Pigmentation Candidate Genes
Estimates of the population structure were in accordance with FST at individual loci, which showed a high level of heterogeneity across the genome. Indeed, the genetic differentiation between the two breeds was shaped by the introgression of European Merino-type gene flow into the Noire de Thibar breed, coupled with intensive selective pressure on coat color [7,31]. Selection of favorable variants is expected to result in a higher level of differentiation for neighboring SNPs. In several instances, outlier SNPs tended to cluster in similar regions (e.g., OAR 3, OAR6, OAR 14,and OAR 20). The observed pattern of region differentiation was probably due to the high linkagedisequilibrium of dense GBS markers. Chang et al. [32] reported that using high density SNP markers was sufficient to detect the LD between markers and causative mutations. Thus, our results are in agreement with previous findings reporting that when a beneficial mutation emerges and subsequently spreads in a population, this process, which is a selective sweep [33], will generate higher population differentiation, higher frequencies of segregating sites, and a characteristic linkage disequilibrium (LD) pattern [34]. Interestingly, as expected, five genomic regions among the seventeen identified in this study harbored key genes related to the pigmentation trait in sheep. This is not unexpected, given that the photosensitization can be the most severe in any portion of the animal exposed to sunlight that lacks a protective fleece, hair, or pigmentation [35][36][37]. The most significant genomic region under selection wasregion1, located on OAR 14, containing the Melanocortin 1 receptor MC1R gene, a key candidate gene for coat color pigmentation. Indeed, the MC1R gene played a central role in the regulation of eumelanin (black-brown) and phaeomelanin (red-yellow) synthesis within the mammalian melanocyte and is often found under selection in several sheep breeds [38,39]. Moreover, previous sequence analysis studies evidenced those mutations, within the MC1R gene, which are responsible for the dominant black-color in sheep [40,41] and pigs [42]. In addition, it could play an important role in the prevention of light-sensitive diseases. To support this, several recessive genetic loss-of-function variants in the MC1R gene increase the risk of developing cutaneous melanoma [43] and have been found to be associated to phenotypic features related to sun sensitivity in European populations [44].The high signal in region 2, located on OAR

High Signals of Selection in Pigmentation Candidate Genes
Estimates of the population structure were in accordance with F ST at individual loci, which showed a high level of heterogeneity across the genome. Indeed, the genetic differentiation between the two breeds was shaped by the introgression of European Merino-type gene flow into the Noire de Thibar breed, coupled with intensive selective pressure on coat color [7,31]. Selection of favorable variants is expected to result in a higher level of differentiation for neighboring SNPs. In several instances, outlier SNPs tended to cluster in similar regions (e.g., OAR 3, OAR6, OAR 14,and OAR 20). The observed pattern of region differentiation was probably due to the high linkage-disequilibrium of dense GBS markers. Chang et al. [32] reported that using high density SNP markers was sufficient to detect the LD between markers and causative mutations. Thus, our results are in agreement with previous findings reporting that when a beneficial mutation emerges and subsequently spreads in a population, this process, which is a selective sweep [33], will generate higher population differentiation, higher frequencies of segregating sites, and a characteristic linkage disequilibrium (LD) pattern [34]. Interestingly, as expected, five genomic regions among the seventeen identified in this study harbored key genes related to the pigmentation trait in sheep. This is not unexpected, given that the photosensitization can be the most severe in any portion of the animal exposed to sunlight that lacks a protective fleece, hair, or pigmentation [35][36][37]. The most significant genomic region under selection wasregion1, located on OAR 14, containing the Melanocortin 1 receptor MC1R gene, a key candidate gene for coat color pigmentation. Indeed, the MC1R gene played a central role in the regulation of eumelanin (black-brown) and phaeomelanin (red-yellow) synthesis within the mammalian melanocyte and is often found under selection in several sheep breeds [38,39]. Moreover, previous sequence analysis studies evidenced those mutations, within the MC1R gene, which are responsible for the dominant black-color in sheep [40,41] and pigs [42]. In addition, it could play an important role in the prevention of light-sensitive diseases. To support this, several recessive genetic loss-of-function variants in the MC1R gene increase the risk of developing cutaneous melanoma [43] and have been found to be associated to phenotypic features related to sun sensitivity in European populations [44].The high signal in region 2, located on OAR 6, harbored the KIT and platelet derived growth factor receptor alpha PDGFRA genes, which are implicated and interact in the functional pathway of coat color in different mammals. For example, the gene complexes KDR, KIT, and PDGFRA were found to be associated with the reddening coat color pattern in Angus cattle [45]. In addition, KIT and PDGFRA were assumed to play important roles in determining white-coat color in Iranian goats [46]. Genomic region 3 on OAR 20 spans at position (50,143,753,857), located close to IRF4 gene (50,969,985,660), and is associated with skin pigmentation, hair color, or skin sensitivity to the sun in humans [47][48][49] and goats [50]. Indeed, the IRF4 gene has been implicated in melanocytic biology; the IRF4 protein was proposed as a diagnostic marker for various melanoma subtypes. Moreover, the transcription factors SOX-10 and PICK1 genes, located on OAR 3, were just downstream of region 4. SOX-10 has been identified as a selection signature responsible for coat pigmentation in sheep [51,52] and PICK1 has a well-established role in melanocyte biology and is essential for melanocyte migration and survival in the plumage color in chickens [53]. Similarly, the endothelin 3 or EDN3 gene (OAR1: 56,388,076-56,412,489), located downstream of the ZNF831 gene, was also detected in studies as the footprint of the selection pigmentation trait in Awassi [51] and worldwide sheep breeds [52].

Conclusions
These findings demonstrate the efficiency of genotyping-by-sequencing (GBS) data, even with a relatively low number of animals, to identify signals of selection for pigmentation candidate genes in two local Tunisian sheep breeds. The variants underlying these signals could play a potential role in adaptation to photosensitive symptoms caused by toxic weeds in white sheep. However, it will be necessary to carry out association and functional studies to demonstrate the implication of these genes' in tolerance to sheep-photosensitization, and therefore, their efficiency as future marker-assisted selection.
Supplementary Materials: The following are available online at http://www.mdpi.com/2076-2615/10/1/5/s1, Table S1: Characteristics of final filtered SNPs (39,788 GBS markers), Table S2: Details about putative SNPs under selection and related differentiated genomic region using F ST estimates using pairwise comparison between Noire de Thibar and Queue fine de l'ouest breeds. Funding: The genotyping financed by the Ministry of Business, Innovation and Employment (New Zealand) via its funding of the "Genomics for Production & Security in a Biological Economy" programme (Contract ID C10X1306). Grant mobility (Koru record ID: 41837; Activity code: A12627) was supported by New Zealand aid programme contribution through Ministry of foreign Affairs and Trade (MAFT).