The Genetic Diversity of Trans-caucasian Native Sheep Breeds

: The genetic variation in 10 indigenous Caucasian sheep breeds was studied with 14 micro-satellite loci in order to determine the genetic diversity among and between the breeds. Five breeds from Asia, five breeds from Europe and one breed from Africa, were included in order to study any relationships or influences they may have with the Caucasian sheep analyzed. A Karakul population from Uzbekistan was included in the study to see whether there was any Central Asian influence. All the 14 loci were found to be polymorphic in all the breeds, with the exception of ILST0056, which was monomorphic in Imeretian. A total of 231 alleles were generated from all the 688 individuals of the sheep analyzed. The mean number of alleles (MNA) at each locus was 16.5. The total number of alleles detected in all samples ranged from 13 in several loci to 23 in OarJMP029. Out of total 308 Hardy-Weinberg Equilibrium (HWE) tests, 85 gave significant results. After Bonferroni correction for multiple tests, 30 comparisons still remained significant to the experimental levels. The Gala population was the most diverse and Imeretian the least diverse with a MNA of 8.50 and 5.51, respectively. Gene diversity estimates exhibited the same trend and ranged from 0.803 in Gala and 0.623 in Imeretian, but generally there is higher diversity among the Caucasian breeds in comparison to other eference breeds. The closest breeds were Tushin and Bozakh with Da of 0.113 and most distant breeds were Djallonké and North Rondalsy with Da of 0.445. Principal Component (PC) analyses were done. PC1 described 14% of the differences. PC2, which described 13% of the differences, further separated the Caucasian breeds from Asian breeds except Karakul and Awasi, and the two British breeds. PC3 described 10% of the differences, allowing better differentiation of the Caucasian breeds. A moderate degree of reliability was observed for individual-breed assignment from the 14 loci using different approaches among which the Bayesian method proved to be the most efficient. About 72% of individuals analyzed were correctly assigned to their respective breeds.


INTRODUCTION
Armenia and Georgia, respectively {Zelyatdinov, 1997}. A large proportion of the remaining sheep are presently found in small households.
The genetic diversity of Caucasian sheep can be attributed to several events that took place in the region : i) The Silk Road trade : Migration of sheep along the Silk Road, especially during the 13 th and 14 th Centuries when trade to Central and Western Asia as well as Europe flourished.
ii) The Mongol invasion of the 13 th century : It has been reported in some accounts that mounted pastoralists, the Oghuz, who are the putative ancestors of the people of Central Asia, migrated to Azerbaijan around the 11 th century. These influence span up to the late 15 th Century before the Mongols were ejected in 1480.
iii) Import of German Sheep to Old Russian Empire by the Czar to provide wool to cloth his army in 19 th Century. Some breeding centres were also set up with merino sheep that had been imported from the Spain in the 18 th Century. By late 19 th Century, Russia had 15 million merinos mainly in the Southern part of Russia. iv) Contribution of transhumance, which is still practiced in the present day. It has been estimated that 80% of sheep in Georgia are raised on transhumance systems (Dimitriev, 1989). They winter on low altitude steppe ranges and then moved to summer mountain pastures, which lie at altitudes of 2,000-3,000 m, 200-500 km away from winter quarters.
Individual identification system has been widely used for breeding purpose in livestock animals. Because of the experimental complexity of the blood typing system it has been replaced with DNA testing nowa days (Yoon et al., 2005).
Molecular genetic markers have become powerful tools for elucidating the population biology of indigenous breeds. Micro-satellites are a special class of repetitive molecular markers, consisting of tandem repeats DNA sized between 2-6 base pairs (Hancock, 1999). Their high degree of polymorphism, codominance and the fact that they can be analyzed by polymerase chain reaction (PCR) has made them one of the most popular genetic markers in use today (Heyen et al., 1997;Luikart et al., 1999). They have been shown to be useful for detection of genetic differentiation among populations, estimating of individuals' relatedness and identity (Chakraborty, 1993) and determining phylogenetic relationship among sheep breeds (Buchanan et al., 1994a;Arranz et al., 1998;Muigai, 2000). Analysis of Micro-satellite variation may thus enhance our understanding of the evolutionary history of Caucasian native sheep populations. This paper examines the genetic variation of 14 Microsatellite loci in 10 indigenous Caucasian sheep breeds in order to determine the genetic diversity among and between the breeds. A Karakul population from Uzbekistan was included in the study to see whether there was any Central Asian influence. Five breeds from Asia, five breeds from Europe and one breed from Africa, were included in order to study any relationships or influences they may have with the sheep from the Caucasus analyzed.

MATERIALS AND METHODS
The native breeds include Tushin (40) and Imeritian (41) from Georgia, Gala (29) and Shirvan (9) from. Azerbaijan, Karabakh (10) and Balbas (13) from Armenia and Azerbaijan, Armyanskaya (24) (developed during the last century using the sheep breed Balbas as a foundation stock and crossed with Rambouillet and Lincoln breeds), Mazekh (37), from Armenia, and Bozakh (20), which is found in adjacent areas of the three countries. All these sheep breeds originated from the Caucasian fat-tailed sheep and are grouped as merino breeds (Ryder, 1983;Dimitriev, 1989), probably in view of their contribution in breeding. Two indigenous Caucasus' breeds, i.e. Bozakh and Shirvan, are listed as endangered by the FAO Worldwatch Report (Scherf, 2000). Blood or hair samples were collected from each breed, the number of individuals is indicated in parenthesis and following reference breeds; Karakul (29), Salt Range (41) (44). Efforts to ensure that individuals sampled were not closely related involved selecting only a small number of animals from each herd and questioning the farmers on the genealogy of each animal. DNA was extracted from peripheral blood lymphocytes using a modified phenol/chloroform/isoamyl alcohol extraction method (Sambrook, 1989). The extracted DNA was ethanol precipitated, vacuum dried and resuspended in 1×TE buffer (pH 7.6).
DNA extraction from hair samples involved cutting 10-15 hair roots into a 1.5 ml tube. An aliquot of 50 µl of lysis buffer (10 mM Tris pH 8.3, 50 mM KCL, 0.5% Tween) and 10 µl of 20 µg/ml solution of proteinase K in 10 mM Tris-HCL (pH 7.5) were added. The mixture was vortexed for 30 seconds, briefly centrifuged at 13,000 rpm and incubated overnight in a 56°C water bath. The samples were then incubated for 10 minutes at 94°C, cooled to room temperature and briefly centrifuged at 13,000 rpm. The resultant solution was then used for PCR.
A total of 14 Micro-satellite markers were used in this study: McM042, MAF065, OarFCB011, McM527, OarFCB020, OarAE129, OarFCB129, OarJMP029, MAF035, ILST056, ILST005, ILST00, ILST0044 and TGLA0053 Kemp et al., 1993;Penty et al., 1993;Buchanan et al., 1994a, b;Hulme et al., 1994;Kemp et al., 1995;Lumsden et al., 1996). The rational of using these 14 markers is that the first 10 loci were isolated in sheep and the rest in cattle and approved to be used in ovine diversity studies. However, micro satellites, complete with their unique flanking sequence have been shown to be conserved in closely related species (Stallings et al., 1991) allowing primers to be used across species (Moore, 1991). Five linked micro satellite loci in combination with new technologies can provide more information on coalescence dates (i. e. narrower confidence intervals than approximately 15 kb of DNA sequence (Wilson and Balding, 1998).
PCR amplification of individual Micro-satellites was carried out as follows: reactions were performed in 96-well micro titer plates in a total volume of 10 µl containing 20-40 ng/µl of DNA, 0.3 µM of each primer (forward primer labeled TET, FAM, or HEX; Applied Biosystems), 0.3 units of AmpliTaq Gold DNA polymerase (Applied Biosystems), 0.125 mM of each dNTP (Boehringer Mannheim, Germany), 1×PCR buffer (10 mM Tris-HCl, pH 8.3) containing 50 mM KCl and 2 mM MgCl 2 . All amplifications included an initial denaturing step of 10 minutes at 94°C, this was followed by 40 cycles of 45 seconds at 94°C, 45 seconds at the optimized annealing temperatures and 1 min at 72°C. Final extension of 20 minutes at 72°C was included. Thermal cycling was performed in Gene Amp* PCR System 9700 (Applied Biosystems). For analysis 0.7 µl of the PCR product was mixed with 2.0 µl formamide/dye/internal size standard GENESCAN TM -350 TAMRA (Applied Biosystems) mixed in 4:1:1 ratio. This mixture was denatured at 94°C for 3 minutes, placed immediately on ice, and analyzed by denaturing polyacrylamide gel electrophoresis using an automated DNA sequencer Applied Bio systems Instrument (ABI 377).
Fluorescent DNA fragments were analyzed, and genotype data were generated using GENESCAN TM 6.7.2 Version 3.0 (Applied Bio systems). GENOTYPER TM version 2.0 (Applied Bio systems). DNA fragment analysis software was used to score, bin and output allelic (and genotypic) designation for each sheep individual. The third order least squares was used for base calling.

Data analysis
Allele frequencies were calculated using the program Excel Micro-satellite Toolkit (version 3) (available at http://acer.gen.tcd.ie/~sdepark/ms-toolkit/). The genotypic frequencies obtained in each of the 22 breeds for each of the 14 Micro-satellites were tested for deviations from Hardy-Weinberg equilibrium (HWE) using the exact test of GENEPOP (Raymond, 1995). Tests for genotypic linkage disequilibrium for all pairs of loci were performed using Fischer's exact test (Raymond, 1995) with unbiased P values derived from Markov chain randomisation method. F statistics were computed by using the methods of Weir and Cockerham (Weir, 1984) as implemented in GENEPOP. Observed heterozygosity, gene diversity (average expected heterozygosity) and coefficient of inbreeding were calculated for each population. Mean Number of Alleles (MNA) was calculated using Allele Sampler version 1.0 (available from the corresponding author on request). Due to the correlation between the MNA and the sample size (Hartl, 1989), random samples of twenty individuals were selected for each population after replacement, except for the Shirvan, Karabakh and Balbas populations, which had 9, 10 and 13 samples available, respectively. The mean number of alleles per locus was recalculated for each population after 99 resamplings.
Using the un weighted pair-group method with arithmetic mean (UPGMA), a breed phylogenetic tree was constructed with Dispan (Ota, 1993) and visualized with treeview (Page, 1996). Bootstrap resampling of loci using 1,000 replicates from the original dataset was used to test the robustness of the tree topology.
Assignment tests were used to determine the likelihood of each individual's genotype being found in the breed from which it was sampled using the program GENECLASS by Bayesian exclusion test method of Rannala and Mountain ( Rannala, Mountain, 1997;Cornuet et al., 1999). The method has been shown to be effective over a wide range of F st values and is robust to slight deviations from Hardy-Weinberg equilibrium proportions (Cornuet et al., 1999). We used the 'leave one out' method when conducting this analysis, where the individual being analyzed was excluded from the data set. The allele frequencies were then recalculated, and then the individual was assigned to the population (The program is available at http://www. ensam.inra.fr/URLB). The linear ordinate technique of principal component analysis (PCA), based on the correlation matrix of the allele frequencies distribution within the breeds, was used to produce graphical representations of 22 populations, using XLSTAT (version 4.3) program (http://www.xlstat.com).

RESULTS
All the 14 loci were found to be polymorphic in all the breeds, with the exception of ILST0056, which, was monomorphic in Imeretian. A total of 231 alleles were generated from all the 688 individuals of the sheep analyzed. The MNA at each locus was 16.5. The total number of alleles detected in all samples ranged from 13 in several loci to 23 in OarJMP029. Heterozygosities, gene diversity and MNA estimates computed across the 14 loci for each population are shown in Table 1.
Out of total 308 Hardy-Weinberg Equilibrium (HWE) tests, 85 gave significant results, which is more than the 15 expected by chance alone. One locus-population combination was monomorphic and could not be tested, ILSTS056/Imeritian. After Bonferroni correction for multiple tests, 30 comparisons still remained significant to the experimental levels.
MNA provides a reasonable indicator of the levels of variability present, assuming that populations are in mutation-drift equilibrium. From this analysis, the Gala population was the most diverse and Imeretian the least diverse with a MNA of 8.50 and 5.51 respectively. Gene diversity estimates exhibited the same trend and ranged from 0.803 in Gala and 0.623 in Imeretian, but generally there is higher diversity among the Caucasian breeds in  Nei's Da genetic distance (Nei et al., 1983) estimates between each pair of breed population are shown in Table 2. Dendogram ( Figure 1) shows the genetic relationships among twenty two sheep populatios using Da genetic distance.The closest breeds based on this parameter are Tushin and Bozakh with Da of 0.113 and most distant are Djallonké and North Rondalsy with Da of 0.445. The pairwise F st (θ) calculations showed little degree of differentiation among the Caucasian breeds, but moderate to substantial degree of differentiation with θ values (Table 3) significantly greater than zero for most other comparisons. The values ranged from -0.001 for Gala and Karabakh to 0.183 for Djallonké Senegal and North Ronaldsay. The degree of population subdivision calculated from F st diversity indices was around 9%. But when the analysis was restricted to Caucasian breeds with Karakulskii included this figure dropped to 5%. Overall, thus the largest part of the genetic diversity can be explained by the variation within breeds, which accounted for about 91%. There was no clear correlation between F st and locus heterozygosity, consequently no correlation between loci diversity and presumably mutation rates, and locus informativeness.
A moderate degree of reliability was observed for individual-breed assignment from the 14 loci using different approaches among which the Bayesian method proved to be the most efficient. About 72% of individuals analyzed were correctly assigned to their respective breeds ( Table 4). Analysis of the Bayesian assignment criterion illustrated the divergence between any one breed and the others, which were considerable for reference breeds, while no great difference was apparent among the Caucasian breeds save for Imeretian, which comes from Abkhazia, an area northwest of Georgia. On exclusion of non-Caucasian but retaining Karakul and the two Pakistani breeds from the analysis the percentage of individuals correctly assigned dropped to 59% and on further excluding these three non-Caucasian breeds the number of individuals correctly assigned to fell to about 53%. This indicates the level of homogeneity in the Caucasian breeds, which seemed to be substantial. There was also a correlation between the number of individuals analyzed and the percentage correctly assigned, with populations with more than 25 individual representatives having increased number of those correctly assigned.

DISCUSSION
Low F st values can be indicative of high current gene flow between populations or can be caused by populations sharing recent common ancestry (Wright, 1969). Gene flow is an important factor determining the degree of intra-and inter-population genetic variation within species (Slatkin, 1987).
Generally there is very low inter-population variation exhibited in this study and high intra population variation (9% versus 91%), indicative of population, which have been in constant interaction (Janson, 1987;Trexler, 1988). These are also evinced by a very low F st value (≤5%) between the Caucasian breeds save for Pakistan and Imeretian breeds. Transhumance is a very important factor in this respect, especially during summer when most flocks are driven into the hills away from their home tract and possible overlap of ranges of different breeds. The F st value was 6-10% between Caucasian breeds and Asian, Merino and Rambouilet, but slightly above 10% with North Ronaldsay, Dorset and Djallonké. This is accentuated by Da values, which exhibit the same trend. High estimates of 5.01 migrant per generation calculated using private allele methods of Barton and Slatkin (Barton and Slatkin, 1986) as done in GENEPOP, indicate current gene flow between neighbourly populations a fact that further corroborates inference from F st and Da.
The low differentiation is further supported by assignment test results. Only about 50% of individuals of the Caucasian breeds were assigned to the population from which they were sampled. Low assignment rates may indicate either high gene flow or low power to assign because of too few markers or too little genetic variation per marker. However, sequential increase of markers included in the analysis from 10 to 14 did not yield substantial increase in individuals correctly assigned, so we attribute our low assignment rates to high gene flow especially among the Caucasian breeds. Moreover, other studies using fewer loci to assign individuals to their respective breeds reported high percentage.
There is substantial difference between observed and expected heterozygosities, most likely the result of the mating structure in the sheep populations. This is also reflected in significant deviations from HWE for several loci, especially OarAE129 and ILST005 (data not shown). Imeritian was the only Caucasian population in which several loci deviated from HWE. This deviation could be due to selection or non-random mating. Although selection, especially human selection (Dimitriev, 1989) cannot be preempted, gene flow and some level of inbreeding inferred from the gene diversity estimates underscore this phenomenon.
Generally, except for Salt Range and Djallonké breeds, H O was significantly different from the H E under the assumption that there was random mating in all the populations. This indicates that there seemed to have been some loss of diversity among these populations probably due to population decline and increased inbreeding. However, diversity in the Caucasus was comparatively higher than in the reference breeds in this analysis. The elevated diversity can be due to introgression from breeds outside the regions. However, when 243 individuals from the Caucasus compared with 444 individuals from outside the region, among the 231 alleles detected, 18 alleles were exclusively found in the focal area, but at very low frequencies. Thus the phenomenon might be due to drift or that the Caucasian population might be in recovery, since such a population is usually characterized by an excess of new mutations at low frequency. The low MNA for Imeretian breed, a population which was the exception in terms of level of diversity reported in this study, can be attributed to the fact that the breed might have undergone population bottleneck and founder effect in the last 3 decades. In a 1980 the breed census they were only 834 individuals (Dimitriev, 1989) and the was initially listed as endangered (Scherf, 1995) but lost the status probably after half a decade of population recovery.
Given that they are listed as endangered breeds by FAO (Scherf, 2000), it might be assumed that Bozakh and Shirvan breeds would exhibit low levels of gene diversity. However, we observed nearly same levels of gene diversity within these two breeds in comparison to others. This may be a result of the recent massive introgression of other Caucasian breeds' genes into their pool. This point becomes clearer especially when considered in the light of individual assignment tests with these two breeds reporting among the least individuals correctly assigned. One would also expect that population from the region of the centre of domestication, i.e. the Awasi breed would have the highest genetic variability. But both the gene diversity and MNA we obtained for Awasi in this study were slightly higher than the ones reported by Arranz and colleagues (Arranz et al., 1998), but of same range with other breeds in these study (0.703, 6.46 vs. 0.656, 0.58 respectively).
The phylogenetic tree we obtained in our study clustered the Caucasian breeds with Pakistani and Central Asian breeds underscoring the historical influence of these two regions on the diversity of sheep breeds of our focal area. The tree obtained was not very robust since most bootstrap values were low, ranging from 6-43%, especially the internal structure of the clade containing Caucasian breeds and Asian breeds. This probably reflects a shallow depth for the radiation of these sheep breeds and historical admixture among them.
The Asian breeds included in this study seem to have more affinity with the Caucasian breeds in comparison with African and European breeds. The PCA plot showed tight clustering of these breeds especially Karakul and Awasi with the Caucasian breeds. Our finding is in agreement with the fact that they have been constant genetic exchange in this region. The Mongol occupation, movement of the Oghuz pastoralists from the Altai plains and constant movement of population between the Caucasus and Central Asia during the Soviet era explains this massive influence from the Karakul. The influences seem to have been more from the north than from the south of the Caspian Sea, since Da and F st estimates between Karakul and the two breeds, which have range occupation of northern region of Caucasus, Bozakh and Tushin were lower than it is for the southern-most Dzharo. The Pakistani influence seem to be more from the south route of Caspian sea with its breeds closer to Azeri and Armenian sheep breeds.
When all the individuals from the Caucasian breeds, which are all of fat tailed type, were lumped together and considered as one breed, the tree obtained clustered these Caucasian breeds together with the Karakul and Salt range breeds, which are also fat tailed, with high bootstrap support. Only Awasi, also fat tailed, seem to have diverged earlier than other Asian thin tailed breeds from the Caucasian breeds. Morever, the Da values obtained in this analysis (0.1299) and average F st values showed the Chinese thin-tailed Lanzhou which is also geographically further away from the Caucasus than Awasi, to be closer to the Caucasian breeds than Awasi (0.1424), and Indian Deccani (0.1538). Between the two Pakistani breeds, Salt Range is the one closest to the Caucasian breeds and cluster tightly with them. This can be attributed to the fact it is a fat tailed breed like the Caucasian ones while Lohi is thin tailed.
Surprisingly, despite the fact that there is overlap in the Pakistani breeds home tracts, Lohi's Da distance from Salt Range is slightly larger than it is from the Caucasian breeds. Due to its ability to withstand harsh conditions, Awasi is found in most of near East and Asian countries, and its gene pool have been used in improving several stocks, which might explain its inconsistent grouping in these analysis.
Portuguese merino and Rambouillet breeds were used in this study in lieu of Spanish merino samples, which we did not have in our DNA database, to infer the derived and ancestral states for all polymorphisms. Portuguese merinos were found to have closest association with Spanish merino in study of several merino populations (Diez-Tascon et al., 2000). Moreover, Portuguese White merino cluster with Rambouillet, from which it was developed with Spanish merino early last Century (Diez-Tascon et al., 2000). Merinos and Rambouillet seem to be more distant from the Caucasian breeds in comparison to Asian breeds with the exception of Lohi. This indicates that the level of merino influence on the genetic diversity of Caucasian breeds is not substantial.
The Caucasus Mountains, which is described by ethnographers and linguists as "Mountains of diversity" or "Refugee of peoples", have from the time immemorial been at the crossroads of cultures. Once, a barrier between early urban civilizations in Mesopotamia and their trade centres in the south, and nomad cultures in the steppes of the north, the scene has changed over two millennia. Thus, the human populations are admixed and have experienced gene flow from both Europe and West Asia. In terms of gene frequencies, these populations are approximately equidistant from European and West Asian populations (Nasidze, 1995). The groups exhibit genetic relationship correlated with more geographical relationships than with linguistic relationships despite the presence of the Caucasus mountains as potentially significant barrier to gene flow Nasidze and Stoneking, 2001)-thus genetic drift has been the dominant influence on shaping the genetic structure of Caucasus sheep populations.
According to both legend and the measurement of wool remains, the black sea coast produced the finest wool of the ancient world. However, in more recent times the native sheep of Trans-Caucasia seem to have been coarse-wool (Ryder, 1983). This is confirmed by the fact that currently nearly half of the wool produced in these countries, which are from the indigenous breeds, is of the coarse wool types (Zelyatdinov, 1997). The overall distribution of genetic diversity in Caucasian sheep is due to an interaction between drift and gene flow over recent timescale. Clearly allele distributions must have been perturbed by recent demographic fluctuations, which limit the conclusions we can make about phylogenetic relationships between these populations. From our study it appears that the Asian influence is to be far more than the European influence.
Before management decisions are taken, there is a need to better understand both the limitations and potentials of genetic analysis. However, detecting drastic reductions in population, due to loss of genetic diversity, is critical in conservation biology where population declines can increase the risk of extinction, which is a major concern in the Caucasus. Our results are in need of further confirmation through the study of larger sample sizes, with wider representation of European and Asia breeds, especially Central Asia. Therefore, even though Y chromosome and mtDNA are prone to the vagaries of drift because of their small effective population sizes, they are probably the most sensitive genetic tools in the genome for detecting admixture between populations and their study will go a long way in making clear some of our inferences.