Microsatellite-based phylogeny of Indian domestic goats

Background The domestic goat is one of the important livestock species of India. In the present study we assess genetic diversity of Indian goats using 17 microsatellite markers. Breeds were sampled from their natural habitat, covering different agroclimatic zones. Results The mean number of alleles per locus (NA) ranged from 8.1 in Barbari to 9.7 in Jakhrana goats. The mean expected heterozygosity (He) ranged from 0.739 in Barbari to 0.783 in Jakhrana goats. Deviations from Hardy-Weinberg Equilibrium (HWE) were statistically significant (P < 0.05) for 5 loci breed combinations. The DA measure of genetic distance between pairs of breeds indicated that the lowest distance was between Marwari and Sirohi (0.135). The highest distance was between Pashmina and Black Bengal. An analysis of molecular variance indicated that 6.59% of variance exists among the Indian goat breeds. Both a phylogenetic tree and Principal Component Analysis showed the distribution of breeds in two major clusters with respect to their geographic distribution. Conclusion Our study concludes that Indian goat populations can be classified into distinct genetic groups or breeds based on the microsatellites as well as mtDNA information.


Background
Goats, like other livestock species, are recognised as important components of world diversity and may play a useful role in sustainable agriculture in future. The goats in India are distributed in all climatic regions and evolved in isolation for a long period due to varying selective pressures and genetic drift. Goat breeds are mainly defined by their geographical position, morphological characteristics and production performance. Goats in India can also be divided into large, medium and small breeds, which are mainly distributed into four major geographical regions: temperate Himalayan region, northwestern region, south-ern peninsular region and eastern region [1]. Goat breeds have also been classified based on their production status: milk producing, meat producing and dual type. In addition to this, considerable variation exists among goat populations in terms of size, coat colour, ear, horn pattern and production performance. In the present study we have included all major goat breeds of India, representing diverse production characteristics (Fig 1 and Table 1). The Jamunapari breed is known as the best Indian dairy goat [2] and is found in isolated pockets in the Chakarnagar area of the Etawah district of Uttar Pradesh (UP). The Jakhrana, also known for high milk yield, is from the Alwar area of Rajasthan. The Barbari is a medium size dual-purpose breed known for its adaptability over a wide range agro-climatic situation. The Sirohi is a medium to large size breed and is used for both meat and milk in its natural habitat. The Marwari is a medium size breed and is known for hardiness and adaptability to extreme temperature. The Black Bengal is the typical dwarf breed of eastern India and known for high prolificacy and meat quality. Pashmina goats are distributed in high altitude of Himalayan region and known for the best fibre production. In addition to the major breeds, there are many intermediate and less distinguishable goat subtypes; however the genetic relationship among the major types as well as subtypes has not been established.
An assessment of genetic variability in domestic goats is a first step towards conservation of genetic resources for maintaining breeding options. In the changing phase of agricultural practices, a few breeds have been used on a large scale for immediate economic gain. Therefore, locally adapted native breeds have been neglected or displaced without knowing their genetic importance. DNA markers have been used to study the genetic variation in livestock, human and other populations [3][4][5]. Genetic markers are used to determine genetic variation between breeds; subsequently relationships among breeds are determined in calculating genetic distance and constructing trees. Microsatellite markers have been used as good tools to analyse the genetic variation in cattle, sheep, pig, and goats [6][7][8][9][10][11]. Using the maternally inherited mitochondrial DNA (mtDNA) sequence information, we have demonstrated earlier the existence of all the three previously known lineages A, B, C and proposed two new lineages among Indian goat populations [12]. However, no study has been conducted on Indian goats using the nuclear microsaltellite/short tandem repeats (STR) loci and covering a wide set of populations from different agroclimatic zones on Indian goats. Therefore, we have characterized seven economically important Indian goat breeds using STR markers.

Genetic variation in Indian goats
Out of 23 STR markers analysed, 4 failed to amplify in any of the samples, 2 showed monomorphic patterns and the remaining 17 were polymorphic. The total number of alleles and allele size range for each locus are presented in Table 2. Among the polymorphic markers, the BM4621 and SRCRSP9 loci showed the highest number of alleles (more than 20) in the populations analyzed. The IDVGA7, BM6526, INRABERN192, TGLA40 and SRCRSP5 loci showed more than 15 alleles. The remaining 10 loci showed less than 15 alleles. Locus SRCRSP10 exhibited the smallest number of alleles (9). The total number of alleles varied from 9 (SRCRSP10) to 22 (BM4621). In total, 248 alleles were observed from 17 loci surveyed. Geographical distribution of Indian goat breeds studied Figure 1 Geographical distribution of Indian goat breeds studied.

Genetic variation within breeds
The most polymorphism was detected at the SRCRSP9 locus (0.85) and the least polymorphism at the ILST S005 locus (0.597). Breed specific alleles were observed at different loci for different breeds with low frequency. Measures of genetic variability are shown in Table 3. Average observed and expected heterozygosity ranged from 0.375 to 0.426 and 0.739 to 0.783, respectively. The mean expected heterozygosity (He) ranged from 0.739 in Barbari to 0.783 in the Jakhrana breed. The Barbari showed the lowest gene diversity, while the Jakhrana and Sirohi showed the highest gene diversity among Indian goats. Wilcoxon's signed ranks test indicated that He was significantly lower (P < 0.05) in Barbari goats than in other breeds.
The mean number of alleles per locus (NA) varied from 8.1 in Barbari to 9.7 in Jakhrana goats. The mean number of alleles per locus (NA) corrected for sample size (calculated based on n = 31) is presented in   Further, an AMOVA analysis was carried out to analyze the variation within and between breeds. The AMOVA revealed that percentage of variation among populations was 6.59% and within populations was 93.41%. Variance components among population were highly significant for all the studied loci (Table 4), demonstrating significant geographical structuring in Indian goats. ILSTS005 and BM4621 contributed 14.42% and 11.50% variability among populations, respectively, and SRCRSP6 and NRAMP showed the lowest variability among populations (2.26% and 3.28%, respectively).

Genetic distance
Allele frequencies were used to generate the D A genetic distance between each pair of populations and distance matrices were used to build phylogenetic trees using the UPGMA and NJ algorithms. As both trees retained similar structure, only the NJ tree constructed from a matrix of D A distances is presented in Figure 2. The D A genetic distances and Fst distances between pairs of breeds are shown in Table 5. The lowest distance was observed between Marwari-Sirohi (0.135) and Jamunapari-Jakhrana. The highest distance was observed between Pashmina and BlackBengal and between Barbari and Black Bengal. The NJ tree revealed two different clusters (Fig. 2). The first cluster consisted of the Marwari and Sirohi breeds, and the 2 nd cluster consisted of the Jamunapari, Black Bengal and Jakhrana breeds. Bootstrap values ranged from 40 to 78 indicating reliable topology of the phylogeny constructed from D A distances. Barbari and Pashmina goats were placed separately in the phylogenetic tree.

Principal component analysis Single-marker analyses
PCA was performed for each of the 17 markers. Corresponding scree plots are presented in Figure 3. Splitting up the inertia according to axes leads to the so-called scree plot. A scree plot is a simple line segment plot that shows the fraction of total variance in the data as explained or represented by each Principal Component. Contribution of a marker to the construction of an axis is measured by the part of the inertia of this axis that is supplied by the marker. Splitting up the inertia of an axis according to markers enables one to evaluate the degree of consensus of this axis. The inertia was different according to the markers. The most important markers were ILSTS005, BM4621 and SRCRSP10, while, the markers Nramp, OARAE101 and SRCRSP9 did not contribute significantly to inertia. For each single-marker analysis, distances among breeds were computed. Distances were unequal among populations (Kruskall-Wallis test, χ 2 = 29.45, 20 d.f, P = 0.08), indicating the existence of a multivariate compromise structure.

Global Analysis
PCA was performed using the frequencies of the 242 alleles of the 17 markers. The first three principal components explained 65% of the total variation. The global principal component analysis for the first three principal components is presented in Figure 4. The first axis contributed about 27% of the inertia and distinguished the Pashmina and Barbari populations from the other populations, especially the Black Bengal. The second axis contributed 20% of the inertia and separated a cluster containing Barbari, Jamunapari, Jakhrana, and Black Bengal, from a cluster containing Pashmina, Sirohi and Marwari. The third axis contributed about 18% of the inertia and again distinguished a cluster containing Sirohi and Marwari from the breeds Jamunapari, Jakhrana and Black Bengal, but differs from the second axis by the different positions of Barbari and Pahsmina. As a result, these three axes revealed a pattern of association that supports a partition of populations into 4 discrete groups: (1) Barbari, (2) Pashmina, (3) Jamunapari, Jakhrana and Black Bengal and (4) Sirohi and Marwari ( Figure 3)

Contributions of markers to the global analysis
The contributions of markers to the construction of axes are plotted in Figure 5. Contributions to axes are variable for all the three axes. Not surprisingly, markers with the greatest inertia contributed the most to the construction of axes: ILSTS005 contributed 20% to the construction of first axis, while BM4621 and SRCRSP10 together contributed 35% (axis 2) and 30% (axis 3). Due to the importance of these three markers, it may be interesting to detail On the other hand, BM4621 and ILST005 participated in the construction of only one or two axes. ILSTS005 participated in the construction of the first axis. The corresponding PCA plot indicates that Pashmina and Barbari breeds were isolated from some other populations, as in the fist axis of the global analysis, but the clusters Jakhrana, Jamunapari, Black Bengal and Sirohi, Marwari were not exhibited by this marker. On the other hand, BM4621, which contributes to the construction of the second and third axes, exhibited these clusters, but did not isolate the Bar-bari and Pashmina breeds. BM4621 revealed three clusters (Pashmina, Barbari, other breeds) in contrast to the four clusters exhibited by the global analysis (Pashmina, Barbari, Black Bengal and Jamnuapari, Jakhrana).

Breed differentiation
Our genetic analysis of seven Indian goat breeds with 17 microsatellite markers showed higher gene diversity as compared to European and Asian goat breeds. Barker, 1994 [13] and Takezaki and Nei, 1996 [14] suggested that microsatellite loci for genetic diversity studies should have more than four alleles in order to reduce the standard error estimates of genetic distance. The total number of alleles per locus in the present study ranged from 9 to 22. This higher number of alleles for each locus suggested that all the markers used were appropriate to analyse diversity in Indian goats. The mean number of alleles observed over a range of loci in different populations was considered to be a reasonable indicator of genetic variation within the populations [3]. A more appropriate measure of genetic variation within a population was gene diversity (average expected heterozygosity) [15]. Gene diversity for each breed ranged from 0.724 in Barbari to 0.783 in Jakhrana. Takezaki and Nei, 1996 [14] determined that for markers to be useful for measuring genetic variation, they should have an average heterozygosity ranging from 0.3 to 0.8 in the populations. This again confirmed that these markers were appropriate for measuring genetic variation. By analysing mitochondrial HVR1 region in our previous study [12], the lowest haplotype diversity was Scree plots of the single-marker PCA Figure 3 Scree plots of the single-marker PCA.
observed in Pashmina goats (0.926) and the highest haplotype diversity was observed in Jamunapari goats. Microsatellite analysis also revealed that the Pashmina goats exhibited the lowest diversity as compared to other breeds in the present study. The measures of population differentiation indicated variability within breeds and the exact test for population differentiation indicated significant differences between breeds. We observed a large, significant difference between expected and observed heterozygosities in all the Indian goat breeds. This large difference indicates that there is a considerable degree of genetic subdivision within breeds. In India goats are exploited very little by artificial selection, but mtDNA analysis has established new lineages in Indian goats as compared to other goat breeds of the world [12].
As no systematic effort has been made to create distinct goat breeds in India, founder effects and genetic drift may have played major role in differentiation of Indian goat breeds. Population subdivision in Indian goats is also supported by the average proportion of genetic differentiation among breeds (8.03%). As R ST values shows the frac-tion of total variance of allele size between populations, the estimated R ST was more than twice the size of G ST and F ST , suggesting that goat breeds differ in both allele frequency and allele size. In addition, AMOVA indicated that 6.59% of the total genetic variation is between breeds of goats, confirming higher within population diversity in the Indian subcontinent. Mitochondrial DNA analysis in Indian goats showed 83% of variation within breeds and 17% among breeds [12]. The between breed variation in Swiss goats was 17% [8] using microsatellites. Similarly mtDNA analysis showed about 10.7% variation among the goat breeds from Africa, Middle East, Asia and Europe [16]. Takezaki and Nei, 1996 [14] have shown that the construction of a phylogenetic tree depends on the type of population and number of markers used to analyse the population. It has been also showed that increasing the number of loci does not necessarily enhance the reliability of the phylogeny [6]. Takezaki and Nei, 1996 [14] have demonstrated that D A and D C are the most efficient means Global Principal Component Analysis (First three principal components) Figure 4 Global Principal Component Analysis (First three principal components).

Genetic relationship between Indian goat breeds
of obtaining a correct tree topology on the basis of microsatellite analysis when within population variation is high and distances between each pair of populations are used to build a NJ tree.
The genetic data revealed that the smallest distance is between Marwari and Sirohi and the largest distances are between Pashmina and Black Bengal, Barbari and Black Bengal, and Barbari and Jakhrana. The highest geographical distance between Black Bengal and Pashmina corresponds to the highest genetic distance. The phylogenetic analysis indicated that breeds were grouped according to their geographic locations except Barbari goats. A similar observation of population clustering according to their geographic origin has been reported in cattle [4]. This shows that geographically adjacent populations are more genetically related. The principal component analysis supported the grouping of animals and the distance between breeds was significant. However Barbari goats showed a deviation as they did not cluster with geographically close breeds such as Sirohi, Jamunapari and Jakhrana. The Barbari did not group with any one of the neighboring breeds, consistent with variation in morphological char-acteristic (ear pattern) and presence of two new lineages by mtDNA analysis. Moreover Jamunapari, Jakhrana and Black Bengal clustered in one group, indicating a shared gene pool, motivating further analysis to establish their migration and origin through a coastal route.
Breeds cluster according to their geographic location. Similar population clustering according to geographic location was previously observed in microsatellite analyses of humans [17], cattle [4] and chickens [18]. Mitochondrial DNA analysis in goats also indicated geographical clustering in the breeds [12]. The result indicated that geographically adjacent populations were more genetically related, perhaps due to founder effects and interbreeding near bordering areas. PCA revealed separation of breeds more clearly between populations of different geographical locations. Some breeds which showed a close phylogenetic relationship are separated more clearly by the PCA plot. Mt-DNA markers are extremely informative for predicting the conformation of the gene pool from maternal inheritance. The Jamunapari is a breed of semi arid regions and isolated in small pocket of Chambal ravines [2]. The Barbari is a goat breed of semi arid regions and Contribution of markers to the construction of the axes of the global analysis Figure 5 Contribution of markers to the construction of the axes of the global analysis.
distributed over a wide breeding tract. Pashmina goats are found in high altitude Himalayan regions and known for fibre quality. Similarly the Black Bengal is the most prolific breed from the eastern part of India and distributed over a wide area. The Jakhrana is also from semiarid regions of Rajasthan and adapted to a specific locality. The Marwari and Sirohi are breeds of arid regions of western India and distributed over a large breeding tract. The differential existence of breeds varying from hot humid to hot arid, hot humid to cold humid and isolated to particular regions is illustrated by breeds such as the Jamunapari and Jakhrana. PCA isolated the Barbari from other breeds indicating their differential origin, consistent with ear characteristics and existence of two new lineages by mtDNA analysis. Principal component analysis showed clustering of goats according to their geographical origin.
Although the breeding tracts of goats are overlapping and no strict breeding policy is adopted to maintain standard breeds in the region, they still maintain genetic distinctness in their natural habitat. Therefore it is necessary to combine genetic data with geographical positioning and to assess the genetic relationship by geostatistical models in further studies.

Conclusion
In conclusion, this analysis showed that microsatellites as well as mtDNA analysis can be used to classify Indian goat populations into distinct genetic groups or breeds. Phylogenetic and principal component analysis showed the clustering of goats according to their geographical origin.
Although the breeding tracts of goats are overlapping and they are spread over all the parts of the country, they still maintain genetic distinctness in their natural habitat.

Genetic Stocks
A total of 302 goats representing 7 major breeds of India were sampled from their natural habitat. The breeds studied have been grouped into 5 different types based on their utility and size ( Table 1). Summaries of goat breeds sampled from their natural habitat are described in Table  1.

Sample collections and DNA isolation
About 10 ml of blood samples were collected from each animal's jugular vein using EDTA vacutainers and stored at 4°C until DNA isolation. An effort was made to collect samples from unrelated individuals based on information provided by farmers. The geographical distribution of the breeds from different regions sampled for the study is described in Table 1. DNA was isolated from blood as described elsewhere [19].

Selection of STR markers
The markers were chosen from the existing bovine, ovine and caprine genetic maps [20][21][22] with an effort to cover all chromosomes having high heterozygosity. Twentythree STR markers were included in this study for analyzing the variation among various goat breeds ( Table 2). All the DNA samples were analysed with 23 STR markers. Each 10μl PCR reaction mixture consists of 10 ng of template DNA, 1× buffer, 200 μM dNTPs, 2.5 mM MgCl 2 , 1 U of AmpliTaq Gold (Perkin Elmer) and 10 pM of each primer. Amplification conditions for these markers were as follows: Initial denaturation at 95°C for 10 min. followed by 95°C for 1 min, specific annealing temperature for each marker as given in Table 2 and 30 s at 72°C for 30 cycles. An final extension temperature of 72°C for 5 min was used for each reaction.

GeneScan and genotyping
One micro liter of PCR amplicons and 0.5 μl of size standard (GS-ROX500) were mixed with 2.5 μl of loading dye (formamide: bluedextrin; 5:1), denatured (94°C for 2 mins) and electrophoresed in 5% Long Ranger (FMC) gel, using ABI 377 automated DNA sequencer (Perkin Elmer). GeneScan (Perkin-Elmer) software was used to analyse the gel image and Genotyping software (Perkin-Elmer) was used to get the allele size of each amplicons.

Statistical analysis
Exact tests for deviations from Hardy-Weinberg equilibrium (HWE) were performed by the GENEPOP Package [23]. The program performs a probability test using a Markov Chain (dememorization 10,000, batches 100, iteration per batch 1000). Genetic disequilibrium was estimated between all locus pair using GENEPOP. The mean number of alleles per locus (NA), total number of alleles (TNA), the observed heterozygosity (Ho) and the expected heterozygosity (He) under HWE were computed using FSTAT [24] and AGArst software [25]. To test sample bias, Allelic richness based on minimum sample size (n = 31) was estimated using FSTAT [24]. Significance levels of difference in NA and He between populations were tested using a Wilcoxon's signed ranks test.
Analysis of molecular variance (AMOVA), Fst and pairwise difference were computed using ARLEQUIN ver 3.11 [26,27]. A second estimator of gene differentiation, R ST , was calculated which accounts for variance in allele size and was defined for genetic markers undergoing a stepwise mutation model. The D A genetic distance [28] was computed with DISPAN [29] to establish genetic distance between populations. A NJ/UPGMA tree was constructed in comparison for Indian goat breeds using the PHYLIP Package Ver3.6 [30]. The significance of population difference was tested using the exact test of population differen-tiation proposed in GENEPOP software based on allele frequency variation.

Principal Component Analysis (PCA)
As admixture in the breeding tract of goats is very common and some populations are known to be admixed, we used a multivariate procedure to represent population relationships. Multivariate procedures are recommended in this situation because the admixed populations are not original evolutionary units and may be misrepresented by phylogeny-based tree-building techniques [31]. Prior to multivariate analysis, we tested for congruence of loci following the two-step procedure developed by Moazami-Goudarzi and Laloe, 2002 [5]. This is because a consensus representation of population relationships is not meaningful if single markers are not congruent, as would occur if many of the distances among populations based on individual loci were negatively correlated [4]. Euclidean distance matrices between all populations were first generated for each locus. Then, a Kruskall-Wallis test [32] on rank scores of standardized distances between populations was carried out and a significant Kruskal-Wallis test indicated that a compromise structure exists because distances are unequal between populations. If a compromise structure exists, then a multivariate analysis, such as principal components analysis will be meaningful. PCA was then done using the allelic frequencies as variables. It leads to a representation of populations as a cloud of points in a metric space. Comparison between the inertia of single-marker enables to compare the typological value of the markers. Inertia can be split up according to axes and/or loci [33]. All computations were done using the R software [34]. More specifically, computations relative to PCA were done with the ade4 package [35].