Morphometric and microsatellite-based comparative genetic diversity analysis in Bubalus bubalis from North India

To understand the similarities and dissimilarities of a breed structure among different buffalo breeds of North India, it is essential to capture their morphometric variation, genetic diversity, and effective population size. In the present study, diversity among three important breeds, namely, Murrah, Nili-Ravi and Gojri were studied using a parallel approach of morphometric characterization and molecular diversity. Morphology was characterized using 13 biometric traits, and molecular diversity through a panel of 22 microsatellite DNA markers recommended by FAO, Advisory Group on Animal Genetic Diversity, for diversity studies in buffaloes. Canonical discriminate analysis of biometric traits revealed different clusters suggesting distinct genetic entities among the three studied populations. Analysis of molecular variance revealed 81.8% of genetic variance was found within breeds, while 18.2% of the genetic variation was found between breeds. Effective population sizes estimated based on linkage disequilibrium were 142, 75 and 556 in Gojri, Nili-Ravi and Murrah populations, respectively, indicated the presence of sufficient genetic variation and absence of intense selection among three breeds. The Bayesian approach of STRUCTURE analysis (at K = 3) assigned all populations into three clusters with a degree of genetic admixture in the Murrah and Nili-Ravi buffalo populations. Admixture analysis reveals introgression among Murrah and Nili-Ravi breeds while identified the Gojri as unique buffalo germplasm, indicating that there might be a common origin of Murrah and Nili-Ravi buffaloes. The study provides important insights on buffalo breeds of North India that could be utilized in designing an effective breeding strategy, with an appropriate choice of breeds for upgrading local non-descript buffaloes along with conservation of unique germplasm.


INTRODUCTION
Through the course of evolution, forces such as mutation, adaptation, reproductive isolation, random drift, selection and breeding have created vast diversity (breeds) among the Bubalus bubalis in India. Many well-defined breeds have been formed for various purposes and the high-performance breeds are intensely selected worldwide. This has led to the replacement of native low performance breeds with high performance ones causing erosion of genetic resources (Groeneveld et al., 2010). However, future trait of interests concerns the genetic diversity of low-performance breeds so, that the population can be maintained for future breeding (Notter, 1999). The genetic diversity of livestock species such as cattle, buffalo, pig, sheep, goat, camel and chicken have been widely studied in several population and the diversity of zebu and taurine cattle breeds is one of the most studied. Diversity between swamp and riverine buffaloes also have been studied using microsatellite markers, mitochondrial D-loop and cytochrome b sequence variations (Barker et al., 1997;Lau et al., 1998;Zhang et al., 2007).
Over the past few decades, world buffalo population has rapidly increased with 208 million buffaloes in the world presently, having 96.79% of population in Asia, 1.68% in Africa, 1.23% in the Americas and 0.22% in Europe (FAOSTAT, 2019). India alone has 109.85 million of buffalo population with 17 registered breeds (Livestock Census, 2019). Water buffalo (Bubalus Bubalis) that most probably domesticated in Indus Valley region for multiple utility creates a rich Bubaline diversity in Northern regions of India, comprising states of Punjab, Haryana, Himachal Pradesh, Delhi and Western Uttar Pradesh. The predominant bubaline genetic resources documented from the region include Murrah, Nili Ravi and Gojri buffaloes (http://www.nbagr.res.in/regbuf.html). Murrah being dominating buffalo germplasm with superior milk-producing ability has suppressed the need for identification and characterization of other breeds. On the other hand, Gojri is one of the little-known buffalo population of the region, with a good milch potential on low to zero input system of dairying and is maintained on a semi-migratory extensive system of management (Vohra, Niranjan & Joshi, 2012;Vohra et al., 2015).
Characterization and classification of animal genetic resources (AnGR) require ample knowledge of the geographical distribution of the breeds, identification of unique characteristics, population size and structure, production environment, and genetic diversity. It is customary to perform a detailed molecular study along with physical and phenotypic assessment to check within and between population diversity in order to characterize a population (Weitzman, 1993;Hall & Bradley, 1995;Barker, 1999;Ruane, 2000;Bruford, Bradley & Luikart, 2003;Simianer, 2005;Toro & Caballero, 2005). Vohra et al. (2015) have used 13 morphometric traits of Gojri buffaloes for phenotypic characterization using Principal Component Analysis, a multivariate statistical technique. Multivariate statistical analysis techniques viz. classical principal component analysis serves the objectives of dimension reduction and clustering when multiple morphometric traits are measured (Johnson & Wichern, 2002;Yadav, Arora & Jain, 2017).
The genetic diversity within Murrah, Nili-Ravi and Gojri breeds have been studied independently that share the common breeding tract in North India. However, in India, buffalo breeding is largely restricted to natural mating that subsequently may have led to admixture of these populations. Hence, there is a need to assess the between-breed genetic diversity among these breeds. The present study was performed to assess the levels of genetic diversity, and population structure among three buffalo breeds of North India. The results will help in formulating an effective breeding, management policy, shaping future conservation plans for maintaining breed purity and reducing the possible admixture due to introgression among purebreds. Thus, it is imperative to compare the region-specific diversity and breed status of bubaline germplasm.

Sampling strategy
Sampling was done from their respective native tracts, to compare the genetic diversity between three different breeds. Gojri buffalo samples were collected during 2017-18 from areas of Punjab and Himachal Pradesh (30 9′ to 32 3′ N and 75 to 77 E) states of India, and samples for Nili-Ravi buffaloes were collected from Punjab state ( 28 17′ to 32 17′ N and 74 to 76 41′ E). The Nili-Ravi has a comparatively smaller geographical distribution compared to Murrah and Gojri. In India, Murrah buffaloes are found in almost all regions but its native area is Haryana state (28 02′ to 30 21′ N and 75 to 77 E) hence, sampling was performed from Haryana and Punjab. The data of Murrah and Nili-Ravi was taken for comparative analysis from Buffalo Genomics Lab of National Bureau of Animal Genetic Resources, Karnal. The breeding and sampling tract had a herd size of 2-6 buffaloes per households. To ensure that selected animals are unrelated, in the absence of detailed pedigree accounts, buffalo breeders were interviewed in detail and their records were checked. Only those animals who were not having common parents for at least 3-4 generations were included in the study. Buffaloes were selected for this study following guidelines of measurement of domestic animal diversity program (FAO, 2011) those represented the original indigenous true to type phenotype. Blood samples were collected with the consent of herd owners. Approximately 5-10 ml of blood from jugular vein was collected by trained Veterinarian using aseptic measures. All the studies were carried out under approval of ICAR-National Dairy Research Institute IAEC 1705/GO/ac/13/CPCSEA. Morphometric traits were measured on a total of 242 adult female buffaloes, comprising of 113 Murrah, 37 Nili-Ravi and 92 Gojri buffaloes, to avoid the sex and age differences. Thirteen (13) different traits were measured on all three breeds De Melo et al. (2018). All the measurements on the animal were recorded in their normal standing position on a levelled surface using a tape measure by the same technical person. Traits recorded were body height (HT), body length (BL), chest girth (CG), paunch girth (PG), face length (FL), face width (FW), horn length (HL), horn circumference (HC), ear length (EL), distance between hip bone (HB), distance between pin bone (PB), tail length (TL), and tail length up to switch (TS). To avoid age effects, only adult buffaloes (3.5 years above) were included in study. For microsatellite genotyping, blood samples were collected from 128 (40 Murrah, 40 Nili-Ravi and 48 Gojri) buffaloes.

Genotyping microsatellite markers
Genomic DNA was isolated from blood samples by standard phenol-chloroform extraction protocol, as described by Sambrook & Russel (2001). DNA concentration was checked by spectrophotometric method. Genetic variation was assayed using 25 microsatellite markers. Microsatellite genotyping was carried out as previously describe in Vohra et al. (2017) following the protocol of Mishra et al. (2010). Fluorescent-tagged forward primers for each microsatellite were used. The primers those were able to produce a fragment size >75 bp were used in the study. Fragment length analysis was performed through ABI PRISM 3100 automatic sequencer (Applied Biosystems, Foster City, CA, USA) after performing polymerase chain reaction (PCR) for fragment amplification. Allele length for the different fragments generated was determined as described in Vohra et al. (2017) using GeneScan software (version 5.0; Applied Bio system, Foster City, CA, USA). Observed number of alleles (N a ), theta estimate (θ H ), expected heterozygosity (H e ), F IT (total inbreeding estimate), F ST (measurement of population differentiation) and F IS (within-population-inbreeding estimate) were calculated using Arlequin v3.5 (Excoffier & Lischer, 2010). Pairwise differences between populations using molecular distances were calculated. Molecular diversity indices were calculated as per Tajima (1983Tajima ( , 1993, Nei (1987) and Zouros (1979), implemented in Arlequin v3.5, and allowing 5% level of missing data. Analysis of molecular variances was done using 1,000 permutations. Exact test of population differentiation was performed with 1,00,000 Markov chain steps and 10,000 dememorization steps.

Statistical analysis
Statistical analyses on morphometric data were performed using SPSS v17.0 software (SPSS, 2001). Multivariate analysis technique such as canonical discriminant analysis (CDA) simultaneously analyse multiple correlated measurements in a single individual and increases the discriminatory power by eliminating variables explaining less variation in the dataset. The relation between the group the individual belongs to and a set of morphometric traits are quantified using CDA (Zhao & Maclean, 2000). As, CDA provides optimum discrimination between population to classify them as a different breed hence, widely used in breed characterization and genetic diversity studies.
The canonical discriminant analysis was performed in SAS v9.3 program (SAS Institute Inc., 2011) using Proc disc procedure, for determining the most discriminatory morphometric traits. The probabilities of assigning an individual to a population were determined using Discrim procedure based on the linear discriminant function that included the thirteen morphometric variables. Wilk's Lambda was used as the test statistics to check for the differences between the means of identified groups of subjects on a combination of dependent variables.
Population assignment was performed using the Bayesian Markov chain Monte Carlo approach implemented in Structure v2.3.4 (Pritchard, Stephens & Donnelly, 2000). The Bayesian clustering algorithm simultaneously estimates allele frequencies at each and individuals are assigned probabilistically to one of the K subpopulations. It assumes that prior distribution of population to which individuals belong and allele frequencies are known.
The most likely number of subpopulations was determined by the Evanno ΔK method (Evanno, Regnaut & Goudet, 2005) using R package "POPHELPER" (Francis, 2017). Twenty independent runs were performed for K = 2 to 4 to identify the most likely number of clusters present in the dataset. The analysis was performed with a burn in period of 10,000 and 50,000 MCMC iterations. Effective population size (N e ) was checked for the three population. N e was estimated using linkage disequilibrium method using NeEstimator v2.01 (Do et al., 2014) Software. The P-critical value (rare allele frequency) was set to 0.05, below which all the alleles were rejected. Jackknife confidence intervals (CI) were calculated for each estimate, N e , of different population. Discrimination between populations was elucidated graphically through principal coordinate analysis (PCoA) using Darwin v6.0.021 (Perrier, Flori & Bonnot, 2003). Principal coordinate analysis is a classical multidimensional scaling method based on dissimilarity or distance matrix to assign each individual a location in a two or three-dimensional space. The dissimilarity matrix based phylogenetic tree was also obtained through Darwin.

Classificatory analysis based on morphometric traits
The means and standard deviation, coefficient of variations and comparison of mean difference between populations for each trait across population is listed in Table 1. A Canonical Discriminant analysis was used to compare different morphometric traits and first two canonical discriminant functions were used in the analysis, which explained 66.7% and 33.3% of total variance, respectively. Wilk's Lambda was used as the test statistics to check the difference between means of the two groups and was found to be significant (Table 2). Classification based on canonical discriminant functions for both original and cross-validated counts predicted 100% assignment of each adult buffaloes to their hypothetically known populations i.e. Murrah, Nili-Ravi and Gojri. All the individuals plotted based on 1st and 2nd canonical discriminant functions were clustered into three distinct groups suggesting three different breeds in the sample (Fig. 1).

Microsatellite variations
Among 25 microsatellite loci genotyped for this study, only 22 loci that were polymorphic for all three populations were used for further downstream analysis. A total of 145, 138 and 173 alleles were found across 22 loci in the 128 individuals sampled from the Murrah, Nili-Ravi, and Gojri buffaloes, respectively. ILSTS60 was highly polymorphic in Gojri buffaloes, ILSTS95 in both Murrah and Nili-Ravi and ILSTS61 in Murrah (Fig. S1A). Mean  (Fig. S1C). Across all three populations mean θ H ranged from 0.17 ± 0.03 (ILSTS19) to 4.44 ± 1.16 (ILSTS58). Marker wise number of alleles, H e , and θ H in each breed given in Table 3.

Genetic diversity
Global Analysis of molecular variance (AMOVA) using 19 polymorphic loci was accomplished. Wright's F-statistics values obtained from the results of global AMOVA revealed 11.7% deficit of heterozygotes for each of the analyzed breeds (F IS ) whereas the total population had a 27.8% deficit of heterozygotes (F IT ). The average genetic differentiation (F ST ) between the breeds was 18.2% (p = 0.00001) indicating significantly  higher discrimination between breeds (Table 4). Details of AMOVA results are presented in Table 4. The pair-wise F ST , Slatkin linearized F ST , and Nei's distance (d) values were used to illustrate the genetic distance between breeds ( Fig. 2A, 2B and 2C), which significantly differentiated all three breeds. Murrah and Nili-Ravi population were clustered together while the Gojri population was present as a distinct group, suggesting it as a different breed in factorial correspondence analysis (Fig. 3) and phylogenetic tree (Fig. S2).
Effective population size (N e ) was estimated excluding rare alleles with an allele frequency below 0.05. The estimated effective population size of Gojri, Nili-Ravi and, Murrah was found to be 142, 75 and 556, respectively. The Jack-knife CIs for the N e estimates were 83-396, 48-141 and 136 to infinity for Gojri, Nili-Ravi and Murrah, respectively at 0.05 P-critical value of rare alleles. Number of possible sub-populations estimated through Evanno ΔK method suggested a maximum of three populations (Fig. 4). Population assignment accomplished in STRUCTURE for K = 2, 3 and 4 and results are presented in the form of bar plot (Fig. 5). For K = 3, as estimated through Evanno ΔK method, it showed 99.4% of Gojri buffaloes are classified into their pre-defined breed. 95.9% of Nili-Ravi and 83.6% of Murrah were assigned to their respective pre-defined groups. Inferred ancestry of each individual (for K = 3) along with average proportion of each individuals classified into respective pre-assigned breeds (for K = 2, 3 and 4) is reported (Table 5).

DISCUSSION
In India, limited work on complete characterization and classification of buffalo genetic resources have been carried out in past, primarily due to availability of much acclaimed Murrah buffaloes. The native breeding tract of Murrah buffalo is North India, and currently, more than 40% of the countries buffalo population is either Murrah or has been crossed with Murrah buffaloes. Hence, genetic studies on other buffalo populations is often neglected. However, several studies have been taken up on morphometric characterization of individual breeds yet there are limited reports on genetic diversity

Morphological diversity
Gojri animals with unique phenotypic appearance are quite distinct from Murrah, Murrah crosses, and Nili Ravi (Vohra, Niranjan & Joshi, 2012). The average measurements for body biometric traits across the studied buffalo populations of the North India is listed in Table 1. Thirteen body biometric traits across three population when compared, revealed significant differences among the studied populations, except for FL and EL among Murrah and Gojri buffaloes, HC and HL between Nili-Ravi and Gojri population. Body height (HT) and face width (FW) did not vary among Murrah and Nili-Ravi populations. The comparison of morphometric traits between all three buffalo breeds of the North India outlined the phenotypic distinctness for majority of the body biometric trait. The coefficient of variation (CV) percentage was least for body height in all three breeds. On comparing average of HT, BL, CG and PG, Gojri buffaloes were found to be of smaller size than Murrah and Nili-Ravi. Nivsarkar, Vij & Tantia, 2000 in Nili-Ravi reported average HT, CG, and BL as 134.2, 207.7 and 165.4 cm, respectively, which is comparable to our results. CV% was highest for HL in Gojri (19.52%) and Murrah (12.61%) buffaloes indicating lesser selection pressure on them and more environmental influence. Face width (FW) was least variable in Murrah and Nili-Ravi while it varied greatly in Gojri buffaloes. Most of the body biometric traits measured were less variable indicating their reliability in population classification studies.
In the canonical discriminant analysis (refer to Table 2), two functions were needed for separation of three distinct population (Asamoah-Boaheng & Sam, 2016) and the first function (function 1) explains 66.7% of the variance and has a Wilk's lambda (0.008) with p = 0.0001. The second function explains only 33.3% of the variance in the data, with a recorded p = 0.0001 for Wilk's lambda (0.122). Wilks' Lambda value close to zero represents a greater number of variables contribute to the discriminant function (Toalombo Vargas et al., 2019), thus the first function in this study plays major role in classifying the breeds.

Microsatellite variations and Genetic diversity
Microsatellite marker data being the best-suited molecular information for the assessment of genetic diversity (Bowcock et al., 1994;Laval et al., 2000;Groeneveld et al., 2010), allows future management and conservation of the breeds based on their genetic architecture  Taberlet et al., 2008;Toro, Fernández & Caballero, 2009;Teneva et al., 2013). The FAO and the ISAG/FAO Advisory Group on Animal Genetic Diversity have proposed a panel of 25 SSR markers for diversity studies in buffaloes (Singh et al., 2018). Hence, in the present study the 22 highly polymorphic microsatellite markers out of 25 marker panel, were used for diversity analysis. The mean number of alleles (N a ) in population over a range of loci is considered a fair indicator of allelic variation. The mean N a ranged from 0-10, 0-13 and 3-14 in Nili-Ravi, Murrah and Gojri buffaloes, respectively (refer to Table 3). The mean N a per locus for each population in the present study is similar to the reports of Kathiravan et al. (2010) in South Kanara buffaloes; Marques et al. (2011) in Brazilian buffaloes, Martínez et al. (2009), Bhuyan et al. (2010 in Murrah buffaloes and in Purnathadi buffaloes Ali et al. (2020). However, a higher number of alleles per locus ranged from 11-26 alleles in Indian water buffaloes have been reported by Vijh et al. (2008). The type of breed under investigation, usage of the particular panel of microsatellite markers, methods of genotyping and the genetic polymorphism within the breed itself greatly influence this variation in the N a .
For microsatellite data, Ohta & Kimura (1973) have established the relationship between the expected homozygosity and its estimator θ, under a pure stepwise mutation     (Excoffier & Lischer, 2010), where H e is the expected heterozygosity. The mean H e ranging from 0.14 to 0.81 across all three population over all loci (refer to Table 3) is indicative of sufficient polymorphism to measure genetic variation (Takezaki & Nei, 1996). In Gojri buffaloes the expected heterozygosity (H e ) ranged from 0.12 to 0.82 that was comparable with the results reported by Singh et al. (2019). While in both Murrah and Nili-Ravi, it ranged from 0 to 0.81 (refer to Table 3). Similar high overall mean H e were reported in Pandharpuri (Khade et al., 2019), Mehsana (Jakhesara et al., 2010), Egyptian (Attia, Abou-Bakr & Hafez, 2014) and Purnathadi (Ali et al., 2020) buffaloes. The substantially high H e values implies the presence of high genetic variability in the studied buffalo breeds and suitability of the marker panel for the present study.
The average F statistics over 19 loci were F IS = 0.11744, F ST = 0.18252 and F IT = 0.27852 (refer to Table 4). In the present study, considerable degree of differentiation has been estimated compared to other buffalo populations from different regions, probably because these populations are genetically distinct. Joshi et al. (2012) reported an F ST value of 7.2% in buffaloes of Indo-gangetic plain, while Vijh et al. (2008) reported a value of 9.69%. However, a comparatively lesser value in eight Indian riverine buffalo was reported by Kumar et al. (2006) which was 3.4%. This value suggested the existence of greater genetic differentiation among North-Indian buffalo breeds than breeds found all over India. A heterozygote deficiency was evident from the positive mean F IS value (0.117 > 0) indicating low to moderate amount of inbreeding in the population. This could be attributed to assortative mating in small herds owned by farmers, genetic hitchhiking, or the null alleles . However, AMOVA over all 22 loci showed 23.59% of variations between populations suggesting the distinctness of all three breeds. The F IS value was found to be 4.74%, which is comparable to values obtained in Purnathadi buffaloes (Ali et al., 2020). The pair-wise F ST values ranged from 0.09 between Murrah and Nili-Ravi to 0.32 between Nili-Ravi and Gojri breeds. The F ST between Murrah and Gojri was 0.25 ( Fig. 2A). Least differentiation was found between Nili-Ravi and Murrah (0.09) based on Slatkin linearized F ST while it was highest between Nili-Ravi and Gojri (0.46). Between Murrah and Gojri it was found to be 0.33 (Fig. 2B). Nei's distance (d); average within and between populations differentiation is presented in the form of a heat map (Fig. 2C), that shows least distance between Murrah and Nili-Ravi breeds and discriminate Gojri as another population. These results were also in compliance with the results from the factorial correspondence analysis based on molecular data and phylogenetic tree obtained from dissimilarity matrix. In the scatter plot of factorial analysis, Murrah and Nili-Ravi are invariably clustered together. Meanwhile, Gojri was found to be plotted on the opposite side of axis-2 (Fig. 3), yet with more scattering among individuals.
The linkage disequilibrium method relies on measures of departure from expected genotype and gametic frequencies, which is the basis for estimation of effective population size (Hill, 1981;Waples, 1991;Luikart et al., 2010). The N e estimated from microsatellite data reflects the true population distribution of North Indian buffaloes. The comparatively higher N e of Murrah buffaloes is due to the larger population distribution of the breed in India. The N e estimates of Gojri population reflects its present status and probable serious inbreeding in future. Hence, the ongoing indiscriminate breeding practices should be shifted to implementation of organized breeding policies focussed on conservation of this distinct breed.

Bayesian genetic structure
Structure software (Pritchard, Stephens & Donnelly, 2000) was used to determine the unbiased structure assuming no prior knowledge regarding the number of breeds. The highest delta K (ΔK) value was calculated as previously described (Evanno, Regnaut & Goudet, 2005). The optimum ΔK value (Fig. 4), was found at K = 3. For K = 2, there was no differentiation between Nili-Ravi and Murrah breed. One individual from Murrah population showed significant level of admixture from Gojri population. 99.7% of Gojri buffaloes were classified as a different breed whereas 99.7% and 98.9% of Nili-Ravi and Murrah buffaloes were assigned to one single population, respectively. When K is assumed to be four, Gojri buffaloes are assigned to one distinct cluster with 99% of memberships. Structure assigned all three population into three different breeds when K was assumed to be three (Fig. 5). This indicated that studied populations has well differentiated and possess unique allelic combinations despite being reared in similar geographical regions. However, a low to moderate amount of admixture could be observed in both Murrah and Nili-Ravi population. For K = 3, Nili-Ravi showed an average admixture of 3.7% from Murrah and 0.4% from Gojri buffaloes while it was quite high for Murrah with an average admixture of 15.2% and 1.2% from Nili-Ravi and Gojri, respectively. While Gojri population was found to have 99.4% pure blood with an admixture of 0.3% from Nili-Ravi and 0.4% from Murrah. Our results indicate the presence of sufficiently large genetic variability among the North Indian Riverine buffaloes. However, Gojri buffalo populations is unique, compared to Murrah and Nili Ravi buffalos, which were found to be genetically closer than expected. Presently the breeding areas of all these populations are overlapping due to adoption of Murrah as an improver breed for milk production, thus leading to its dominance over Nili-Ravi and Gojri buffalo.

CONCLUSION
This study demonstrated that the characterization and classification of genetic diversity in Indian buffaloes could be better accomplished through a parallel approach comprising morphometric traits and microsatellite markers. Study of buffalo genetic diversity of Northern India revealed admixture of two major dairy buffalo breeds and a distinct buffalo population was identified. The results obtained provides an opportunity for the design of genetic improvement programs with appropriate choice of breeds for upgrading local non-descript buffaloes along with conservation of unique germplasm. The estimates of effective population size and fixation indices indicate absence of intense systematic selection in past. Further studies involving large populations including samples from other regions of Indian buffalo with FAO recommended microsatellite loci are required to understand the genetic relationships among buffalo genetic resource of India.