Introduction

South Asia is one of the major hotspots for the modern human migration, which has been attested by the molecular data provided by different genetic systems.1, 2, 3, 4, 5 However, the inferences drawn from these studies are contrasting.6, 7, 8, 9 As far as the mitochondrial DNA (mtDNA) is concerned, it is clear that most of the maternal ancestry of the subcontinent is autochthonous.10, 11, 12, 13, 14 However, the Y chromosome studies still needs a thorough examination.15, 16 The Autosomal studies have suggested the presence of two major components,6, 17 nevertheless require validation from ascertainment bias free data that is, complete genome sequence data.18 Apart from population dispersal and admixture at intercontinental level, local migration and admixture can also have significant role in shaping the genetic diversity of the subcontinent.19, 20, 21 Therefore, this study is an attempt to decipher the variations in this context at a higher resolution thereby conferring the fine-scale genetic structure of the populations living in the Uttarakhand state of North India.

The state of Uttarakhand, formerly known as Uttaranchal is one of the Himalayan states of India. It is bordered in the North by Tibet, South by Uttar Pradesh, East by Nepal and West by Himachal Pradesh. The size of its present-day population is about 10 million with major proportion of Brahmin and Kshatriya populations.22 The population density of this region is 2.5 times lower than average India due to cold climate and steep hilly regions. The state comprises 13 districts divided into two zones, known as Gharwal and Kumaon (Figure 1). This state also comprises of several Hindu pilgrimages (Devbhumi—literal meaning the Land of Gods), and is considered as a land of religious significance among Hindus. Considering the prehistoric human occupation in the light of ancient rock paintings and Palaeolithic stone tools, this region can be fundamental for the information on South Asian relic lineages.23, 24, 25

Figure 1
figure 1

The geographic division of Uttarakhand and the sampling location of various ethnic groups analyzed in present study. A full color version of this figure is available at the Journal of Human Genetics journal online.

The caste structure of this region is similar to that classified in the Indian classical chaturvarna system.26 Majority of the caste and tribal populations of this region speak various branches of Indo-Aryan languages, collectively known as Pahari branch.27 To cover a detailed understanding about the genetic structure of this state, we have analyzed the maternal and paternal lines of ancestries of various ethnic groups of this region. Moreover, keeping in mind the geographical position of this state, we first evaluated the sex-specific genetic admixture with the Eastern and Western Eurasians, and further evaluated genetics of the classical caste system.

Materials and methods

Sampling

About 5.0 ml blood samples were collected from each individual belonging to six caste populations from three major districts of the Gharwal division of the Himalayan state of Uttarakhand (Figure 1). Altogether 323 samples were analyzed for mtDNA and Y chromosomal markers (Table 1; Supplementary Tables 1 and 2). The sampling involved a detailed interview procedure with queries pertaining to name of the caste, which they belonged, and any oral history about their ancestry and so on. The related individuals with similar family background were avoided. This project was approved by the Institutional Ethical Committee of the Council of Scientific and Industrial Research (CSIR)-Centre for Cellular and Molecular Biology, Hyderabad, India. The informed written consent was obtained from all the volunteers.

Table 1 The ancestry proportions of maternal and paternal lineages of Uttarakhand and neighboring populations

Genotyping

We sequenced the hypervariable segment I of mtDNA and the variations were scored against the revised Cambridge Reference Sequence28 and Reconstructed Sapiens Reference Sequence29 (Supplementary Tables 1 and 2). Haplogroups were assigned collectively based on hypervariable segment I variations and by genotyping the coding regions mutations published till date in PhyloTree (build 16).30 For the comparative analysis, we have merged our data with the published sources.14, 16, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45 A total of 30 Y chromosome biallelic markers from previously published data set46 were used in this study for assigning the haplogroup to each individual. For haplogroup assignment we have used Karmin et al.47 and recently updated nomenclature of The International Society of Genetics Genealogy (ISOGG).48 We have also merged data of Tharu population from the same region published elsewhere.14 The most parsimonious tree for mtDNA (Supplementary Figures 1–3) and Y chromosome (Supplementary Figure 4), were built following the median-joining algorithm.49

Statistical analysis

Principal component analysis for mtDNA and Y chromosomal haplogroups was performed with merged data sets by using software POPSTR (http://harpending.humanevo.utah.edu/popstr/), kindly provided by H. Harpending. We have used hypervariable segment I sequence data for mtDNA and binary coded (0 as ancestral and 1 as derived) for Y chromosome. Subsequently, the genetic distances among populations and populations groups were generated from Arlequin 3.5.50

Results and discussion

The genetic structure of the Uttarakhand populations was first studied in the context of Eurasian populations. The analysis of maternal and paternal lineages showed a contrasting sharing of ancestry with East and West Eurasians (Table 1 and Supplementary Table 3). Analyses of the haplogroups and sub-haplogroups revealed a prevalent East Eurasian component for both male and female gene pools, more evident in maternal ancestry of the Uttarakhand. We observed that the maternal lineages of the Uttarakhand comprised of a high proportion of east Eurasian haplogroups, consistent with the other Himalayan states.14, 18, 36, 51 Conversely to the maternal ancestry, the paternal heritage was mainly comprised of the autochthonous Indian and West Eurasian-specific lineages (Supplementary Figure 4 and Supplementary Table 3). When we compared maternal lineage of Uttarakhand at the haplotype level, we found that majority of East Eurasian-specific haplotypes were prevalent in Tibet (Supplementary Figures 1–3), supporting a similar scenario observed in Nepal.18 The massive haplotype sharing with the neighboring Uttar Pradesh and Nepal (rather than Northeast India) was evident in the Network analyses (Supplementary Figures 1–3).

The frequency based principle component analysis for both mtDNA and Y chromosome haplogroups placed the Uttarakhand population over the North Indian-Nepali cluster (Figure 2). The East Eurasian vs South Asian lineages were major precursor for the PC1 in both the genetic systems, whilst, PC2 was derived by Tibetan vs Southeast Asian lineages. The high heterogeneity of maternal lineages observed in frequency distribution (Table 1) and Network analysis (Supplementary Figures 1–3), is also attested by the principal component analysis. Four Y chromosome haplogroups were the major predictors of the principal component analysis structure. East Eurasian-specific haplogroups (for example, O2a1-M95 and O3a2c1-M134) were responsible for their affinity with East/Southeast Asia, whereas haplogroups R1a1a-M17 and R2a-M124 were playing antagonistic roles. The substantial presence of Y chromosomal haplogroup Q-P36.2 also links them with the Central Asian populations (Supplementary Table 3). In turn, the West Eurasian-specific lineages show substantial elevation than observed in the state of Uttar Pradesh.

Figure 2
figure 2

Principal component analysis (PCA) plots constructed on the basis of mtDNA (upper panel) and Y chromosome (lower panel) haplogroup frequencies. Data from neighboring regions are compiled from published sources (given in Materials and methods section). A full color version of this figure is available at the Journal of Human Genetics journal online.

To gain a comprehensive overview of inter and intra-caste relations of this region, we compared the maternal and paternal heritage of various caste and tribal populations. Previous genetic studies on different caste populations have failed to make a consensus on their origins and affinities.19, 52, 53, 54 It was also suggested that the restricted marriage practice was the major contributing factor for a fine tapestry of the social system.43, 54 However, contradictory conclusion were drawn for the roles of males and females in shaping the caste stratification.19, 43, 54, 55 When comparing with the previous analyses,43, 52 our data showed significant (unpaired t-test P<0.0001) higher proportion of West Eurasian-specific lineages among traditionally higher caste (Brahmin and Kshatriya) populations for maternal lineages, although it was non-significant for the paternal ancestry (unpaired t-test P=0.5468).

While comparing the genetic distances, we found that the Brahmins and Kshatriya consistently remained as outliers. Therefore, to evaluate the roles of males and females in the social stratification of this region, we calculated the distances of studied populations, with respect to the Brahmins, who were the extreme outlier in the tree (Figure 3). For this, we have grouped populations according to their classical social status—Brahmins, Kshatriya, Vaishya (Goswami and Shah) and Shudra (Arya and Tamta). Interestingly, the maternal structure of this region was in congruent with the classical social system, where distance from Brahmins to other groups was following a social ladder type structure (Figure 3a). However, the male line of descent did not reveal any kind of such local structure (Figure 3b). Notably, the distance from Brahmins to Vaishya was significantly higher than Brahmins to Shudra (t-test P<0.001). Moreover, Arya (Shudra) was closest to Brahmins and Kshatriya than to any other non-Brahmin and Kshatriya populations. The affinity of Arya towards Brahmins and Kshatriya is due to the analogous high frequency of haplogroups R1a1a-M17 and R2a-M124 (Supplementary Figure 4 and Supplementary Table 3). A similar observation was reported elsewhere,19 however, the most parsimonious reason given for such discrepancy is highly unlikely in our case giving out the closer paternal ancestry of Shudra (Arya) with the Brahmin and Kshatriya (Supplementary Figure 4).

Figure 3
figure 3

The genetic distance of each population and population groups based on mtDNA (a) and Y chromosome (b) data. As Brahmins were attaining outlier position, all the distances were shown with respect to the Brahmins.

In conclusions, our extensive analysis on uniparentally inherited markers led to the most precise identification of East/Southeast Asian, South Asian and West Eurasian haplogroups among Uttarakhand populations and a better understanding of the extent of admixture from East and West. We have observed that the admixture of East Eurasian lineages to Uttarakhand population was inclined towards higher female geneflow. The unexpected link of Shudra (Arya) with the higher traditional caste (Brahmin and Kshatriya) was major factor likely to be responsible for disordering the traditional caste barrier in the Uttarakhand state. In addition, generating well resolved mtDNA and Y chromosomal lineages allowed us to detect the directional geneflow and their accretion from other side of the Himalayas.