Implications of copy number variation in people with chromosomal abnormalities: potential for greater variation in copy number state may contribute to variability of phenotype

Copy number variation is common in the human genome with many regions, overlapping thousands of genes, now known to be deleted or amplified. Aneuploidies and other forms of chromosomal imbalance have a wide range of adverse phenotypes and are a common cause of birth defects resulting in significant morbidity and mortality. “Normal” copy number variants (CNVs) embedded within the regions of chromosome imbalance may affect the clinical outcomes by altering the local copy number of important genes or regulatory regions: this could alleviate or exacerbate certain phenotypes. In this way CNVs may contribute to the clinical variability seen in many disorders caused by chromosomal abnormalities, such as the congenital heart defects (CHD) seen in ~40% of Down’s syndrome (DS) patients. Investigation of CNVs may therefore help to pinpoint critical genes or regulatory elements, elucidating the molecular mechanisms underlying these conditions, also shedding light on the aetiology of such phenotypes in people without major chromosome imbalances, and ultimately leading to their improved detection and treatment.


Introduction
One of the fastest-growing research areas in genetics in the past few years has been the investigation of structural variation in the human genome, with copy number variants (CNVs) found to be much more common than previously imagined. As well as contributing to interindividual phenotypic variation in the general population, these genomic variants may hold the key to understanding the differences in severity seen in people with chromosomal abnormalities. Aneuploidy, for example, can be regarded as a state of large-scale changed copy number resulting from loss or gain of an entire chromosome, arising due to non-disjunction during cell division in either meiosis I or II. Aneuploidies and other chromosomal abnormalities are a common cause of birth defects and are associated with significant morbidity and mortality. The spectrum of phenotypes seen in each affected individual is clearly dependant on the particular chromosomal region concerned, although there is still a great deal of variability in the presentation of phenotypes.
Individuals with Down's syndrome (DS), for example, like those with other aneuploidies or with chromosomal deletions or duplications, vary greatly in their clinical features, capabilities, disabilities and prognosis. Although many of these people, with the necessary social and clinical support, can and do lead active and fulfilling lives, some DS sub-phenotypes are associated with increased morbidity and mortality in infancy. Congenital heart defects (CHD), for example, occur in *40% of DS; 50% of these require surgical correction within the first year of life and survival to one year is only *76% compared with *91% for non-CHD DS patients (Yang et al. 2002). An appreciation of the molecular mechanisms behind this phenotypic variation and the identification of susceptibility loci may provide diagnostic and prognostic markers to enable us to predict the occurrence of clinically serious phenotypes, such as CHD, in DS. Additionally, analysis of these loci might help elucidate the mechanisms of CHD in non-DS children.

Copy number variation and its effects on human phenotype
To assess the implications of genomic copy number variants in the context of chromosomal abnormalities, it is helpful to consider the evidence that they affect phenotype in euploid individuals. Until recently, it was assumed that there are two copies of every gene in euploid individuals, with one on the maternally-inherited chromosome and the other inherited paternally. Any two human genomes were thought to be 99.9% similar, with the main source of genetic variation due to single nucleotide polymorphisms (SNPs). Technological advances often result in paradigm shifts, however, and an unexpected amount of copy number variation was revealed through the advent of array comparative genomic hybridisation (array CGH) and, more recently, of second-generation sequencing technologies. There are currently over 14,000 CNV regions listed on the TCAG Database of Genomic Variants (DGV, updated March 2010), predicted to overlap a significant proportion of the genome. Approximately 11,500 genes, including over 2600 listed in the Online Mendelian Inheritance in Man database, are reportedly overlapped by CNVs (DGV March 2010).
To understand the role of CNVs in human disease, it is important to consider how these genomic variants may affect phenotype. Deletion or duplication of dosage-sensitive genes, or their regulatory regions, could have adverse phenotypic effects, and copy number changes correlate with expression of affected genes (Henrichsen et al. 2009;Stranger et al. 2007). The majority of common CNVs are much smaller than the subset investigated by Stranger et al. (2007), leading them to propose that a substantial proportion of heritable variation in gene expression may be explained by copy number variation. In addition to direct copy number effects, CNVs may also exert positional effects, for example by shifting them closer to or further away from heterochromatin. Furthermore, adverse phenotypic effects may result from interactions between CNVs and other variants, for example a deletion on one chromosome may unmask a harmful recessive allele on the other. Conversely, a genomic duplication that incorporates a harmful mutation may have a double gain-of-function effect: a duplication of the PRSS1 and PRSS2 genes along with a missense mutation was recently shown to cause hereditary pancreatitis (Masson et al. 2008).
It is likely, therefore, that CNVs do significantly affect human phenotypes, and this is reflected by an increasing number of associations reported between CNVs and disease (Tables 1, 2). Several autoimmune disorders have been associated with CNVs, including systemic lupus erythematosus (Fanciulli et al. 2007), Crohn's disease (Fellermann et al. 2006;McCarroll et al. 2008) and psoriasis (Hollox et al. 2008), and a range of other diseases have now been associated with CNVs (reviewed in Zhang et al. 2009). It has been postulated that CNVs could account for a large proportion of the missing heritability of common diseases that has emerged after the recent plethora of genome-wide  Diskin et al. (2009) association studies (GWAS) (reviewed in Manolio et al. 2009), but this is the subject of heated controversy. In a recent large-scale genotyping survey of common CNVs, Conrad et al. (2010) found that the CNVs that they were able to genotype easily were only in linkage disequilibrium with 34 out of 1,554 trait-associated SNPs from GWAS, thus they suggest that common CNVs are unlikely to account for much of the missing heritability. There is, however, a particular challenge in analysis of amplified sequences (which may be remote from the original sequence copy (Conrad et al. 2010)), multi-allelic variants (particularly VNTRs) and recurrent events, so further analysis of those CNV types is required before the full contribution of common CNVs to human disease can be assessed.
As an alternative to the common disease-common variants hypothesis (Chakravarti 1999;Lander 1996), the rare variants hypothesis postulates that a collection of many, individually less frequent copy number changes may collectively significantly contribute to disease susceptibility. Although many CNVs are present in an appreciable proportion of the population, the majority are likely to be less frequent: for example, in our array CGH investigation of 50 apparently healthy French male samples, 809 out of 1469 (*55%) multi-probe CNV regions were identified in only one individual (de Smith et al. 2007). Rare copy number changes have been implicated in neurodevelopmental disorders, including autism (Marshall et al. 2008;Sebat et al. 2007) and schizophrenia (Wilson et al. 2006;Xu et al. 2008). Furthermore, a number of rare genomic structural variants have recently been associated with obesity (Bochukova et al. 2010;Walters et al. 2010), and many more low frequency CNVs with strong effects may contribute to common disease phenotypes: large sample cohorts and different populations will need to be investigated to uncover these rare variants.

CNVs in aneuploidy
Thus, it is now clear that genomic structural variants may significantly impact phenotype in euploid individuals: these effects may be even more pronounced in people with chromosomal abnormalities. For example, a phenotype associated with a particular monosomy or sub-chromosomal deletion may be ameliorated by the presence of an amplification CNV on the unaffected chromosome, as gene expression within the CNV region could remain at a normal, euploid level. Conversely, a deletion CNV could cause a more severe phenotype.
An example could be Turner syndrome, which is the only whole chromosome monosomy (45, X) that is viable in humans, with approximately 1 in 2,000 female births having one copy of the X chromosome (Nielsen and Wohlert 1990). In addition to the main features of short stature and ovarian failure, present in almost all cases, there are many other phenotypes that may present, including a short webbed neck, kidney malformations, hearing problems and various learning difficulties, as well as increased risk of type 1 diabetes (Gravholt et al. 1998). It is possible that the extensive copy number variation on the X chromosome-37.8% according to the DGV (March 2010)-may contribute to phenotypic variability in this disorder. For example, a critical region for the neurocognitive deficits in Turner syndrome was mapped to Xp22.3, containing 31 genes (Ross et al. 2000), and several CNV loci have been reported in this region.
Similarly, phenotypic variation in genomic disorders could also be a reflection of underlying copy number variation within the relevant genomic region. The most common microdeletion in humans is DiGeorge syndrome (or velo-cardio-facial syndrome) caused by deletion of a 1.5-3 Mb region at chromosome 22q11.2 (Scambler et al. Triplication (4 copies 1992) and occurring in 1 in 5,000 births (Botto et al. 2003). This disorder is also associated with variable phenotypes, such as cleft palate, congenital heart disease (Shprintzen et al. 1978), renal anomalies (Czarnecki et al. 1998) and increased risk of schizophrenia (Murphy et al. 1999). Several CNV loci have been identified within the 22q11.2 deletion region in apparently healthy individuals: of interest, two of the candidate genes for the schizophrenia phenotype, PRODH (Li et al. 2004) and GNB1L (Williams et al. 2008), are overlapped by a number of CNVs, and these could potentially modify this phenotype in DiGeorge patients.
In polyploidy or sub-chromosomal duplications where abnormalities result from extra copies of the chromosome regions, deletion CNVs may ''normalise'' copy number. Alternatively, there is potential for even greater amplification of copy number where amplification CNVs are present. A simple duplication CNV on the non-disjoining chromosome may lead to the presence of up to six copies of a gene in a trisomic individual (Fig. 1) and the subsequent phenotypic effects associated with increased gene expression levels. Conversely, a gene deletion on the non-disjoining chromosome may alleviate the expected effect of trisomy by reducing the local copy number state to one or two copies. Thus, the compounded effect of aneuploidy and copy number variation has the potential to generate a much wider range of phenotypes than would be expected on the basis of chromosome number alone. It has been suggested, therefore, that CNVs are likely to act as 'modifiers of the phenotypic variability of trisomies' (Beckmann et al. 2007) and the same is probably also true for smaller chromosomal imbalances.

Phenotypic variability in Down's syndrome may be modified by CNVs
Using Down's Syndrome as an example, an extra copy of chromosome 21 is not sufficient to cause the full range of phenotypes associated with this disorder, as individuals can present with a range of sub-phenotypes. Some features are almost always present, such as the characteristic facial appearance and mental retardation, although these can vary widely in the severity of their presentation (Kallen et al. e illustrates how a deletion on parent B and MII NDJ in parent A, however, that leads to a total of two copies of the gene in the trisomic child which could potentially ameliorate the pathological effects of trisomy 1996). Other features, however, are only present in a fraction of DS cases: for example, congenital heart defects (CHD) are present in *40% of cases (Park et al. 1977) and gastrointestinal (GI) defects in *8% of DS (Epstein et al. 1991). Leukaemia is also common in DS, with an *20% increased risk of acute lymphoblastic leukaemia (ALL) (Hasle et al. 2000), and transient myeloproliferative disorder occurring in *10% of DS newborns, of which 10-20% develop acute megakaryoblastic leukaemia (AMKL) before the age of 4 (reviewed in Hasle 2001). A detailed knowledge of the genes on the affected chromosome is required to understand the phenotypic effects of aneuploidy. In addition, it is important to determine the overall effects of gene dosage imbalance, which may also be complex. Individuals with partial trisomy 21 resulting from unbalanced chromosomal translocations will only exhibit those features associated with the extra genomic material present. Molecular analysis of these patients enables 'phenotypic mapping' by defining the critical genomic regions that harbour genes associated with various phenotypes (Epstein et al. 1991). As with all regions of the genome, many common CNVs and rarer genomic structural variants are found on chromosome 21. Two recent reports describe structural variations in which the presence of increased copies of genomic regions on chromosome 21 contributes to human phenotype. In one study, duplication of a region at 21q21 that included the amyloid precursor protein (APP) gene was found to cause familial Alzheimer's disease in five separate families (Rovelet-Lecrux et al. 2006). The duplication varied in size from 0.58 to 6.37 Mb and contained from 5 to 12 annotated genes including APP. Despite the largest of these duplications being over 6 Mb, none of the families exhibited any clinical evidence of DS (Cabrejo et al. 2006). A second duplication of a 4.3 Mb region at 21q22.13-q22.2, containing just over 30 genes, did help refine the map of the DS critical region (DCR) as it caused a DS phenotype in three family members (Ronan et al. 2007). These individuals had the facial characteristics of DS and mild cognitive disability indicating that this region includes part of the DCR but not all of it. It is apparent, therefore, that different sub-phenotypes of trisomy 21 are caused by increased copy number at different regions across chromosome 21. It is also evident that some DS individuals who are trisomic for a particular critical region do not always exhibit the associated phenotype, or display a much milder form.
A complex interplay of molecular factors is likely to be responsible for the trisomic phenotypes, possibly involving non-chromosome 21 genes. Furthermore, interaction of trisomy with the presence of certain embedded deleterious alleles or haplotypes may contribute to the variable presentation of the different phenotypes (including degree of mental retardation, congenital heart defects, Hirschprung and other gut diseases, and leukaemia). The increasing amount of copy number variation that has been discovered in the human genome, specifically on chromosome 21, adds another dimension to the molecular consequences of trisomy: it is possible that the phenotypic variability seen in DS may be due to CNVs on this chromosome.
On the DGV, there are 507 reported CNV calls covering 35.0% of chromosome 21, which is similar to the CNV coverage of other chromosomes (33.9%) (DGV March 2010). Many of these are common in the population. For example, our investigation of only 50 healthy French male samples revealed 20 multi-probe CNVs, and an additional 57 single probe CNV signals, on chromosome 21 (de Smith et al. 2007): almost half were detected in multiple samples (47%), and 34% had a frequency of [5% (Fig. 2). In addition, these CNVs overlapped 38 known genes. There were several CNVs within TIAM1 (T-lymphoma invasion and metastasis-inducing protein 1), for example, which is expressed in almost all analysed tumour cell lines, including B-and T-lymphomas, melanomas and carcinomas (Habets et al. 1995). A CNV was also found within DSCAM¸which is a good candidate gene for the CHD phenotype seen in DS, as it lies within the predicted minimum critical region and is expressed in the heart during cardiac development (Korbel et al. 2009). Several other CNVs have since been discovered in this gene, as listed on the DGV, including a common deletion that incorporates 4 exons (Matsuzaki et al. 2009).
Other genes within the CHD critical region also overlap CNVs: a common intronic deletion (10%) was identified in C2CD2 (Conrad et al. 2010); 6 CNVs have been reported within PRDM15, including one deletion overlapping 5 exons that was found in 3/90 Yoruban individuals (Matsuzaki et al. 2009); and one deletion was identified in BACE2 overlapping the last exon of this gene (Mills et al. 2006).
CNVs also overlap RUNX1, with one deletion removing the promoter region and first exon of this gene (Gusev et al. 2009). RUNX1 codes for a transcription factor involved in haematopoiesis (North et al. 2002) and is associated with AMKL, which is estimated to be 500-fold more frequent in children with DS than in the general population (reviewed in Zipursky et al. 1992): CNVs in this gene may, therefore, protect against or increase susceptibility to leukaemia in DS. These CNV loci and other regions that could potentially impact variable phenotypes of trisomy 21 are listed in Table 3.
Analysis of copy number variation on chromosome 21 may, therefore, lead to a better understanding of the etiology of DS sub-phenotypes and could possibly shed light on the origin of these conditions in the wider non-trisomic population. Knowledge of which genes are involved in the generation of CHD in DS, for example, and how they interact with other gene products, could have many implications for managing CHD in the non-trisomic population: for example, adult onset CHD may be prevented by therapeutic agents that target critical pathways controlling cardiogenesis, and new approaches may be developed for stem-cell guided cardiac repair (reviewed in Passier et al. 2008).
Since the discovery of trisomy 21, it has been hypothesised that genes present in three copies are over-expressed  Shaikh et al. (2009) by 1.5-fold relative to the euploid state, but this has subsequently been shown to not always be the case. Some genes are over expressed, some are expressed at euploid levels while other genes appear to be down-regulated (Li et al. 2006;Prandini et al. 2007). This may, of course, reflect adaptive regulatory control, but may also reflect the effects of local copy number variation on chromosome 21, which will result in deviations from the predicted three copies of each gene, as shown in Fig. 1.

Conclusions
Copy number variation may, therefore, make a significant contribution to phenotypic variation in aneuploidy and other chromosomal abnormalities. Targeted analysis of such variation might, therefore, represent an effective strategy for refining critical regions and pinpointing genes central to key pathways (i.e. of cardiogenesis or oncogenesis). This would also help to enable the prediction of which individuals may develop serious phenotypes, such as CHD in DS, which may add an extra dimension to antenatal screening programmes in the future by contributing to the process of 'informed choice' by the potential parents of a DS child, as well as well as aiding the long term management of these patients. Additional benefits may also accrue from increased understanding of pathogenic mechanisms relevant to patients with similar phenotypes outside the context of chromosomal imbalance. In summary, recent technological advances, such as high-resolution array CGH and second-generation sequencing, have vastly and rapidly increased our knowledge of copy number variation in the human genome. As these techniques continue to improve, and greater numbers of samples and populations are investigated, so our understanding of how copy number correlates with phenotype will grow, yielding information not just relevant to individuals with chromosomal imbalances, but to similar conditions in the wider population.