Skip to main content

Advertisement

Log in

Spectrum of large copy number variations in 26 diverse Indian populations: potential involvement in phenotypic diversity

  • Original Investigation
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

Copy number variations (CNVs) have provided a dynamic aspect to the apparently static human genome. We have analyzed CNVs larger than 100 kb in 477 healthy individuals from 26 diverse Indian populations of different linguistic, ethnic and geographic backgrounds. These CNVRs were identified using the Affymetrix 50K Xba 240 Array. We observed 1,425 and 1,337 CNVRs in the deletion and amplification sets, respectively, after pooling data from all the populations. More than 50% of the genes encompassed entirely in CNVs had both deletions and amplifications. There was wide variability across populations not only with respect to CNV extent (ranging from 0.04–1.14% of genome under deletion and 0.11–0.86% under amplification) but also in terms of functional enrichments of processes like keratinization, serine proteases and their inhibitors, cadherins, homeobox, olfactory receptors etc. These did not correlate with linguistic, ethnic, geographic backgrounds and size of populations. Certain processes were near exclusive to deletion (serine proteases, keratinization, olfactory receptors, GPCRs) or duplication (homeobox, serine protease inhibitors, embryonic limb morphogenesis) datasets. Populations having same enriched processes were observed to contain genes from different genomic loci. Comparison of polymorphic CNVRs (5% or more) with those cataloged in Database of Genomic Variants revealed that 78% (2473) of the genes in CNVRs in Indian populations are novel. Validation of CNVs using Sequenom MassARRAY revealed extensive heterogeneity in CNV boundaries. Exploration of CNV profiles in such diverse populations would provide a widely valuable resource for understanding diversity in phenotypes and disease.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Abdulla MA, Ahmed I, Assawamakin A et al (2009) Mapping human genetic diversity in Asia. Science 326:1541–1545

    Article  PubMed  CAS  Google Scholar 

  • Basu A, Mukherjee N, Roy S, Sengupta S, Banerjee S, Chakraborty M, Dey B, Roy M, Roy B, Bhattachatyya NP et al (2003) Ethinic India:a genomic view, with special reference to peopling and structure. Genome Res 13:2277–2290

    Article  PubMed  CAS  Google Scholar 

  • Caceres A et al (2010) Multiple correspondence discriminant analysis: an application to detect stratification in copy number variation. Stat. Med. 29:3284–3293

    Article  PubMed  Google Scholar 

  • Cann RL (2001) Genetic clues to dispersal in human populations: retracing the past from the present. Science 291:1742–1748

    Article  PubMed  CAS  Google Scholar 

  • Chao J, Shen B, Gao L, Xia CF, Bledsoe G, Chao L (2010) Tissue kallikrein in cardiovascular, cerebrovascular and renal diseases and skin wound healing. Biol Chem 391:345–355

    Article  PubMed  CAS  Google Scholar 

  • Clevert, Djork-Arné, Mitterecker A, Mayr, et al. (2010) cn.FARMS: a probabilistic model to detect DNA copy numbers. Nucleic Acids Research 2011:1–13

  • Conrad DF, Pinto D, Redon R, Feuk L et al (2010) Origins and functional impact of copy number variation in the human genome. Nature 464:704–712

    Article  PubMed  CAS  Google Scholar 

  • Craddock N, Hurles ME, Cardin N et al (2010) Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464:713–720

    Article  PubMed  CAS  Google Scholar 

  • Ding C, Cantor CR (2003) A high-throughput gene expression analysis technique using competitive PCR and matrix-assisted laser desorption ionization time-of-flight MS. Proc Natl Acad Sci U S A 100:3059–3064

    Article  PubMed  CAS  Google Scholar 

  • Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, Nadeau JH (2010) Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11:446–450

    Article  PubMed  CAS  Google Scholar 

  • Estivill X, Armengol L (2007) Copy number variants and common disorders: filling the gaps and exploring complexity in genome-wide association studies. PLoS Genet 3:1787–1799

    Article  PubMed  CAS  Google Scholar 

  • Fanciulli M, Norsworthy PJ, Petretto E et al (2007) FCGR3B copy number variation is associated with susceptibility to systemic, but not organ-specific, autoimmunity. Nat Genet 39:721–723

    Article  PubMed  CAS  Google Scholar 

  • Frazer KA, Murray SS, Schork NJ, Topol EJ (2009) Human genetic variation and its contribution to complex traits. Nat Rev Genet 10:241–251

    Article  PubMed  CAS  Google Scholar 

  • Gonzalez E, Kulkarni H, Bolivar H et al (2005) The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307:1434–1440

    Article  PubMed  CAS  Google Scholar 

  • Hasin Y, Olender T, Khen M, Gonzaga-Jauregui C, Kim PM, Urban AE, Snyder M, Gerstein MB, Lancet D, Korbel JO (2008) High-resolution copy-number variation map reflects human olfactory receptor diversity and evolution. PLoS Genet 4:e1000249

    Article  PubMed  Google Scholar 

  • Hasin-Brumshtein Y, Lancet D, Olender T (2009) Human olfaction: from genomic variation to phenotypic diversity. Trends Genet 25:178–184

    Article  PubMed  CAS  Google Scholar 

  • Heutinck KM, ten Berge IJ, Hack CE, Hamann J, Ro wshani AT (2010) Serine proteases of the human immune system in health and disease. Mol Immunol 47(11–12):1943–1955

    Article  PubMed  CAS  Google Scholar 

  • Huang dW, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4:44–57

    Article  CAS  Google Scholar 

  • Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C (2004) Detection of large-scale variation in the human genome. Nat Genet 36:949–951

    Article  PubMed  CAS  Google Scholar 

  • Indian Consortium Genome Variation (2008) Genetic landscape of the people of India: a canvas for disease gene exploration. J Genet 87:3–20

    Article  Google Scholar 

  • Indian Genome Variation Consortium (2005) The Indian Genome Variation database (IGVdb): a project overview. Hum Genet 118:1–11

    Article  Google Scholar 

  • Itsara A, Cooper GM, Baker C et al (2009) Population analysis of large copy number variants and hotspots of human genetic disease. Am J Hum Genet 84:148–161

    Article  PubMed  CAS  Google Scholar 

  • Jakobsson M, Scholz SW, Scheet P et al (2008) Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451:998–1003

    Article  PubMed  CAS  Google Scholar 

  • Kim PM, Korbel JO, Gerstein MB (2007) Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context. Proc Natl Acad Sci USA 104:20274–20279

    Article  PubMed  CAS  Google Scholar 

  • Kirov G, Grozeva D, Norton N, Ivanov D, Mantripragada KK, Holmans P, Craddock N, Owen MJ, O’Donovan MC (2009) Support for the involvement of large copy number variants in the pathogenesis of schizophrenia. Hum Mol Genet 18:1497–1503

    Article  PubMed  CAS  Google Scholar 

  • Kusenda M, Sebat J (2008) The role of rare structural variants in the genetics of autism spectrum disorders. Cytogenet Genome Res 123:36–43

    Article  PubMed  CAS  Google Scholar 

  • Lee JA, Carvalho CM, Lupski JR (2007) A DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell 131:1235–1247

    Article  PubMed  CAS  Google Scholar 

  • Lopez CC, Brems H, Lazaro C, Estivill X, Clementi M, Mason S, Rutkowski JL, Marynen P, Legius E (1999) Molecular studies in 20 submicroscopic neurofibromatosis type 1 gene deletions. Hum Mutat 14:387–393

    Article  Google Scholar 

  • Majumder PP (1998) people of India: biological diversity and affinities. Evol Anthrop 6:100–110

    Article  Google Scholar 

  • Majumder PP (2001) Ethnic populations of India as seen from an evolutionary perspective. J Biosci 26:533–545

    Article  PubMed  CAS  Google Scholar 

  • Malhotra KC (1978) Morphological composition of the people of India. J Hum Evol 7:45–63

    Article  Google Scholar 

  • McCarroll SA, Altshuler DM (2007) Copy-number variation and association studies of human disease. Nat Genet 39:S37–S42

    Article  PubMed  CAS  Google Scholar 

  • McCarroll SA, Kuruvilla FG, Korn JM et al (2008) Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 40:1166–1174

    Article  PubMed  CAS  Google Scholar 

  • McKinney C, Fanciulli M, Merriman ME et al. (2010) Association of variation in Fc {gamma} receptor 3B gene copy number with rheumatoid arthritis in Caucasian samples. Ann Rheum Dis

  • Perry GH, Dominy NJ, Claw KG et al (2007) Diet and the evolution of human amylase gene copy number variation. Nat Genet 39:1256–1260

    Article  PubMed  CAS  Google Scholar 

  • Redon R, Ishikawa S, Fitch KR, Feuk L et al (2006) Global variation in copy number in the human genome. Nature 444:444–454

    Article  PubMed  CAS  Google Scholar 

  • Reich D, Thangaraj K, Patterson N, Price AL, Singh L (2009) Reconstructing Indian population history. Nature 461:489–494

    Article  PubMed  CAS  Google Scholar 

  • Sebat J, Lakshmi B, Troge J, Alexander J et al (2004) Large-scale copy number polymorphism in the human genome. Science 305:525–528

    Article  PubMed  CAS  Google Scholar 

  • Sebat J, Lakshmi B, Malhotra D et al (2007) Strong association of de novo copy number mutations with autism. Science 316:445–449

    Article  PubMed  CAS  Google Scholar 

  • Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow CE, Lin AA, Mitra M, Sil SK, Ramesh A et al (2006) Polarity and temporality of high-resolution y-chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of Central Asian pastoralists. Am J Hum Genet 78:202–221

    Article  PubMed  CAS  Google Scholar 

  • Singh KS (2002) People of India: introduction national series. Anthropological Survey of India. Oxford University Press, Delhi

    Google Scholar 

  • The HUGO Pan-Asian SNP Consortium (2009) Mapping human genetic diversity in Asia. Science, pp 1541–1545

  • Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D, Olson MV, Eichler EE (2005) Fine-scale structural variation of the human genome. Nat Genet 37:727–732

    Article  PubMed  CAS  Google Scholar 

  • Walsh T, McClellan JM, McCarthy SE et al (2008) Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320:539–543

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

We thank Amit Chaurasia for computational and Ankita and Rishi Das Roy for IGV browser support; Financial support to MM CSIR(CMM0016, SIP0006) and Council for Scientific and Industrial Research SRF to PG and PJ is acknowledged. We also acknowledge The Centre for Genomic Applications for Microarray and Sequenom facility and Spinco Biotech Pvt. Ltd. for support with the SVS7 software. The data is available at http://igvbrowser.igib.res.in.

Author information

Authors and Affiliations

Authors

Consortia

Corresponding author

Correspondence to Mitali Mukerji.

Additional information

P. Gautam and P. Jha contributed equally to this work.

Electronic supplementary material

Below is the link to the electronic supplementary material.

439_2011_1050_MOESM1_ESM.tif

Size distribution of CNVRs in Database of Genome Variants (DGV).The size distribution of segments in DGV shows that 23% of the segments are of size equal to or more than 100 kb implying the importance and prevalence of large CNVs in genome Supplementary material 1 (TIFF 1.27 Mb)

439_2011_1050_MOESM2_ESM.tif

Geographic locations of populations sampled. The populations sampled in this study cover the length and breadth of India and are from various ethnic, linguistic backgrounds Supplementary material 2 (TIFF 2817 kb)

439_2011_1050_MOESM3_ESM.tif

Inter-probe distance of Affymetrix 50K Xba array. This plot shows that around 35% of probe-pairs are 5kb apart from each other, and 25% of probe pairs are just 1kb apart giving confidence over calling CN altered regions Supplementary material 3 (TIFF 1203 kb)

439_2011_1050_MOESM4_ESM.docx

Chromosomal CNV landscape in all the populations. The 26 different population show different extent of CNVs. The red line depicts deletion and blue line amplification Supplementary material 4 (DOCX 1.91 mb)

439_2011_1050_MOESM5_ESM.tif

Multiple correspondence discriminant analysis (MCDA): multiple correspondence discriminant analysis (MCDA) on all 26 Indian populations using 632 polymorphic CNVRs (present in more than 10% of cohort) to detect population stratification Supplementary material 5 (TIFF 3590 kb)

439_2011_1050_MOESM6_ESM.docx

Heterogeneity in CNV boundary: Representation of deletion and amplification regions encompassing genes in different samples as revealed by array data. The target for Sequenom probe is indicated by black arrow. There is an enormous heterogeneity in CNV boundaries and some of the CNV regions are not queried by the Sequenom MassARRAY probe Supplementary material 6 (DOCX 2.27 mb)

439_2011_1050_MOESM7_ESM.doc

Heterogeneity in CNV boundaries in DGV: Heterogeneity in CNV boundaries as present in public database DGV for some of the genes (ABCC1, ODAM, PRKG1 and SDK1) from our validation genes set. This observation also pointed out in our data indicates the difficulties posed in the validation of such loci Supplementary material 7 (DOC 407 kb)

Supplementary material 8 (XLS 39 kb)

Supplementary material 9 (XLSX 8.42 mb)

Supplementary material 10 (XLS 341 kb)

Supplementary material 11 (XLSX 41.7 kb)

Supplementary material 12 (XLS 49 kb)

Supplementary material 13 (XLS 31 kb)

Supplementary material 14 (XLS 38 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gautam, P., Jha, P., Kumar, D. et al. Spectrum of large copy number variations in 26 diverse Indian populations: potential involvement in phenotypic diversity. Hum Genet 131, 131–143 (2012). https://doi.org/10.1007/s00439-011-1050-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-011-1050-5

Keywords

Navigation