Integrated Genomic Analysis Implicates Haploinsufficiency of Multiple Chromosome 5q31.2 Genes in De Novo Myelodysplastic Syndromes Pathogenesis

Deletions spanning chromosome 5q31.2 are among the most common recurring cytogenetic abnormalities detectable in myelodysplastic syndromes (MDS). Prior genomic studies have suggested that haploinsufficiency of multiple 5q31.2 genes may contribute to MDS pathogenesis. However, this hypothesis has never been formally tested. Therefore, we designed this study to systematically and comprehensively evaluate all 28 chromosome 5q31.2 genes and directly test whether haploinsufficiency of a single 5q31.2 gene may result from a heterozygous nucleotide mutation or microdeletion. We selected paired tumor (bone marrow) and germline (skin) DNA samples from 46 de novo MDS patients (37 without a cytogenetic 5q31.2 deletion) and performed total exonic gene resequencing (479 amplicons) and array comparative genomic hybridization (CGH). We found no somatic nucleotide changes in the 46 MDS samples, and no cytogenetically silent 5q31.2 deletions in 20/20 samples analyzed by array CGH. Twelve novel single nucleotide polymorphisms were discovered. The mRNA levels of 7 genes in the commonly deleted interval were reduced by 50% in CD34+ cells from del(5q) MDS samples, and no gene showed complete loss of expression. Taken together, these data show that small deletions and/or point mutations in individual 5q31.2 genes are not common events in MDS, and implicate haploinsufficiency of multiple genes as the relevant genetic consequence of this common deletion.

Prior studies have suggested that haploinsufficiency of multiple genes is the disease mechanism associated with deletions spanning 5q31.2. However, these studies were primarily designed to identify biallelic gene mutations by analyzing the residual non-deleted 5q31.2 allele using MDS samples, AML samples, or cell lines containing a cytogenetically visible 5q31.2 deletion, and they only examined 20 of the 28 5q31.2 genes [6,7,8,9,10]. Therefore, these studies did not adequately address the possibility that haploinsufficiency of a single 5q31.2 gene could arise from a heterozygous point mutation, or small microdeletion, in a 5q31.2 gene in MDS samples lacking a cytogenetic del(5q). Finding a heterozygous nucleotide mutation or microdeletion in a single 5q31.2 gene in MDS patients without evidence of del(5q) would expedite our understanding of the pathogenesis conferred by large deletions.
To address the possibility that haploinsufficiency of a single 5q31.2 gene may occur in MDS, we performed a comprehensive and systematic screen using total exonic resequencing and array comparative genomic hybridization to evaluate all 28 genes in the 5q31.2 interval from 46 MDS patients (37 lacking a 5q31.2 deletion). Our data indicate that loss of one copy of chromosome 5q31.2 genes in de novo MDS occurs predominantly through large cytogenetic deletions involving many genes, not a single gene, and suggest that haploinsufficiency of multiple chromosome 5q31.2 genes is likely to be the relevant genetic consequence of this common deletion.

Human subjects
Forty-six adult patients (age .18 years) with de novo MDS (no prior malignancy, antecedent chemo-or radiotherapy) were enrolled in a study at the Siteman Cancer Center at Washington University to identify genetic factors contributing to MDS initiation and progression. Approval was obtained from the Washington University institutional review board for these studies. After obtaining written informed consent, a bone marrow sample was obtained for analysis of tumor cells, and a 6-mm punch biopsy of skin was obtained for analysis of unaffected somatic cells. Bone marrow DNA was prepared using the QIAamp DNA mini kit (Qiagen, Valencia, CA) and skin DNA was prepared using a standard phenol/chloroform extraction followed by an ethanol precipitation protocol.

Array Comparative Genomic Hybridization
Array comparative genomic hybridization (CGH) was performed using paired tumor (bone marrow) and germline (skin) Figure 1. Chromosome 5q31.2 commonly deleted segment and array CGH plots. A. Diagram of the chromosome 5q31.2 commonly deleted segment (CDS) (136.3-138.6 megabases) and the 5q33.1 CDS associated with the 5q minus syndrome (148.6-151.1 megabases). The centromere is located at the left and the telomere to the right. B. Whole chromosome 5 log 2 ratio (test/reference) plot for sample #315529 which contains del(5q) in 1/20 bone marrow cell metaphases, and has a blast count of 12%. No deletion was identified in this sample. To allow visualization of the data, each data point on the plot represents the average log 2 ratio for consecutive probes in a 20,000 base pair region (median number of probes per bin = 46). The dashed lines indicate the 5q31.2 and 5q33.1 CDS locations depicted in panel A. C. (Upper panel) Whole chromosome 5 log 2 ratio (test/reference) plot for sample #176267. An interstitial deletion on chromosome 5 produces a negative log 2 ratio for probes located within the deletion. The deletion boundaries are indicated by the arrowheads. The deletion spans from base 86,152,517-155,672,191 (determined using segMNT). The log 2 ratio deviation for the segment is consistent with the deletion of one allele. Sample #176267 contains del(5q) in 20/20 bone marrow cell metaphases (100%), and has a blast count of 7%. (Lower panel) Chromosome 5 log 2 ratio (test/reference) plot for sample #176267 from 136.3-138. 6 Mb, corresponding to the 5q31.2 CDS. No biallelic deletion was identified in this region. doi:10.1371/journal.pone.0004583.g001 genomic DNA samples (non-whole genome amplified DNA) from 20 of the 46 MDS samples that we resequenced (5/20 samples contain deletions spanning 5q31.2). These 20 samples were chosen because of sample abundance. We used a custom array CGH platform (Roche NimbleGen Systems, Inc.) containing 385,297 long oligomer (average length 51 nucleotides) probes spanning human chromosome 5 (median probe spacing ,500 base pairs). Labeling, hybridization, washing, and scanning were performed as previously described [11]. The log 2 (test/reference) signals were analyzed using a circular binary segmentation algorithm (segMNT) [12] and a hidden Markov model (wuHMM) [13] to identify somatically acquired segmental DNA copy number changes. To call a copy number change, both algorithms required a segment to span a minimum of 5 consecutive probes. Full access to the primary array data and MIAME description are available at http://bioinformatics.wustl.edu.

Sequencing and Pyrosequencing
Sequencing and Pyrosequencing methods have been previously described [14,15].
Expression profiling CD34+ cells were purified using the autoMACS system (Miltenyi). Total RNA was prepared using Trizol (Invitrogen, Carlsbad, CA), processed, and hybridized to the human Affymetrix U133plus2 array by the Siteman Cancer Center Microarray Core Facility at Washington University according to the manufacturer's instructions (see http://pathbox.wustl.edu/ ,mgacore/genechip.htm for a complete list of protocols). In order to perform interarray comparisons, the data for each array was scaled to a target intensity of 1,500 using Affymetrix Microarray Suite software.

Quantitative RT-PCR
Quantitative RT-PCR was performed using total RNA extracted from CD34+ cells and the one-step Assays-on-Demand kit (Applied Biosystems). All samples were run in duplicate. Individual cDNA samples were normalized according to their levels of beta actin transcript. The comparative Ct method was used for analysis.

Statistical analysis
Differences in allele frequency were evaluated using Fisher's Exact test. Genotype associations were further evaluated for significance according to codominant, dominant, and overdominant genetic models by Chi-square testing. Results were corrected for multiple comparisons by the Bonferroni method. Gene expression profiling results were compared using a two-tailed ttest.

Patient characteristics
A total of 46 patients with de novo MDS samples were chosen for study based on availability of high quality, abundant, paired bone marrow (tumor) and skin (germline) DNA samples ( Table 1). Paired samples allowed us to distinguish somatic mutations from germline polymorphisms. Thirty-seven of the 46 patients did not have a chromosome 5q31.2 deletion by cytogenetics. Seven of the 9 patients with monosomy 5 or del(5q) had complex cytogenetics, and none of the 37 non-del(5q) patients had clinical evidence of 5q minus syndrome (Table S1). Cases were classified in accordance with the French-American-British (FAB) system upon diagnosis and banking of their bone marrow specimens. The patients include refractory anemia (RA), RA with ringed sideroblasts (RARS), RA with excess blasts (RAEB), RAEB in transformation (RAEB-T), and chronic myelomonocytic leukemia (CMML), with a median International Prognostic Scoring System (IPSS) score of 1 (range 0-3), and a median blast count of 4.5% (range 0-26%). The 46 bone marrow samples were independently reviewed by a hematopathologist (JLF) and 6 cases were reclassified as acute myeloid leukemia in the World Health Organization system ( Table S1). The 20 MDS samples analyzed using array CGH were chosen from the 46 MDS samples used for the resequencing studies, and had a median IPSS score of 2 (range 0-3), and a median blast count of 10% (range 1-26%) ( Table 1 and Table  S1).

Array comparative genomic hybridization
To complement the sequence-based studies, we used array comparative genomic hybridization (CGH) to measure DNA copy number across chromosome 5 that might be missed by cytogenetics or resequencing. We detected chromosome 5 deletions using the array CGH platform in 4/5 samples that were known to have del(5q) using cytogenetics ( Figure 1, and data not shown). Array CGH did not detect del(5q) in one MDS sample that had 1/20 metaphases with a del(5)(q22q34) ( Figure 1). This is consistent with our analysis of AML samples in which array CGH rarely detects copy number changes if the neoplastic clone comprises ,25% of bone marrow cells (judged by the percent abnormal metaphases) (Walter, et al., submitted). None of the del(5q) samples had detectable deletions on the remaining wildtype chromosome ( Figure 1, and data not shown), and no cytogenetically silent microdeletions were detected in the nondel(5q) samples (data not shown).
Nucleotide changes in 28 chromosome 5q31.2 genes To define the limit of detection for this DNA resequencing platform, we screened 90 MDS samples for FLT3 internal tandem duplications (ITD) using standard agarose gel electrophoresis and the Agilent LabChip. We detected 3 ITD+ samples and chose one for further analysis. We performed 6 serial dilutions of the ITD+ MDS DNA with FLT3 wild-type genomic DNA and performed PCR amplification followed by DNA sequencing of the PCR products. We could detect the FLT3 ITD when 12% of the alleles harbored the mutation, or when ,20-25% of cells contain a heterozygous mutation (data not shown).
We designed and validated primers on control DNA templates for 479 amplicons covering the coding region and proximal introns of all 28 genes in the 5q31.2 interval (see Table S2 for primer sequences). We then produced 7.5 megabases of doublestranded sequence for these 28 genes in the 46 samples. To ensure comprehensive coverage of all 28 genes, we performed bidirectional sequencing of amplicons and systematically tracked whether each coding nucleotide in all 28 genes was covered once or twice in the 46 samples. On average, 94% of all coding bases were covered, and 88% of bases were sequenced on both strands ( Table 2). Only 10 amplicons failed to produce adequate sequence coverage using our resequencing pipeline, and 7 of these yielded high quality sequence after primers were redesigned. One exon in NME5, CTNNA1, and LOC729429 failed for all 46 samples, accounting for their low percent coverage ( Table 2).
A semiautomated analysis pipeline was used to identify sequence variants [14]. We restricted our analysis to nonsynonymous and splice site nucleotide changes. Variants not found in dbSNP (build 129) were resequenced in bone marrow and paired germline (skin) samples. No somatic nonsynonymous or splice site mutations were detected in the 28 genes ( Table 2). However, 12 novel SNPs were identified ( Table 3). Nine of these were nonsynonymous, including a stop codon in PKD2L2, and 3 occurred in non-coding genes. The minor allele frequency for these 12 SNPs was 1.1-4.7% in this MDS population.
SNP frequencies of the 28 chromosome 5q31.2 genes in MDS vs. race-matched controls To assess whether germline alleles of chromosome 5q31.2 genes are associated with MDS, we compared both the allele and genotype frequencies of all known SNPs in these genes (23 nonsynonymous, 19 synonymous, and 5 from non-coding genes) in the 45 MDS patients of European ancestry from our cohort to race-matched HapMap controls. The minor allele frequency for 16/47 SNPs was less than 5% in all MDS or race-matched control samples, leaving 31 SNPs that could be analyzed with this sample size ( Table S3). The minor allele frequencies of two coding SNPs were significantly over-represented in MDS cases vs. race-matched controls: rs3734166 (CDC25C R70C) (OR 1.836, 95% conf 1.030-3.272, p = 0.048), and rs2905608 (KLHL3 A157A) (OR 99.47, 95% conf 5.914-1673, p = 2.187E-10). We used pyrosequencing to genotype 190 additional samples from normal individuals of European descent (cancer free controls from Washington University, and HD100 controls) for SNPs rs3734166 and rs2905608. There was no significant difference in allele or genotype frequencies for rs3734166 (p = 0.246, p = 0.450, respectively) and rs2905608 (p = 0.214, p = 0.182, respectively) after pooling all available control data ( Table 4). This study was powered to detect changes in minor allele frequencies that occurred in at least 10% of MDS cases, therefore, race-matched control samples were not genotyped for the 12 novel SNPs identified in our 46 MDS samples because they occurred at such low frequencies (1.4-4.7%).

SNP loss of heterozygosity in del(5q) samples
To assess whether preferential retention of a SNP allele occurred in the 9 monosomy 5 or del(5q) samples, we examined their sequence traces for 64 SNPs (12 SNPs in Table 3, 47 SNPs in Table S3, and 5 SNPs without race-matched control data). DNA extracted from a homogeneous population of tumor cells harboring a deletion of one copy of chromosome 5q31.2 will contain complete loss of heterozygosity for all SNPs in 5q31.2 genes that are heterozygous in the germline. Sequence traces from these bone marrow samples display only one peak at each SNP, which represents the retained SNP allele in the tumor cells ( Figure 2). In contrast, DNA extracted from a heterogeneous population of cells containing both tumor cells with a del(5q), and normal cells without del(5q), produce sequence traces displaying two alleles of unequal peak heights indicating allele skewing. There were 5 SNPs that had allele skewing in 3 or more del(5q) samples, but 0/5 SNPs displayed uniform retention of the same allele (data not shown).
To address the possibility that the cellular heterogeneity present in MDS bone marrow samples could limit the use of unfractionated cells for DNA based genomic studies, we used the SNP resequencing data to determine whether we could detect clonality in MDS bone marrow samples that had less than 5 percent myeloblasts. We calculated the allele trace peak height ratio (peak height of allele A/(peak heights of alleles A+B)) for biallelic SNPs in tumor DNA samples (bone marrow). The allele peak height ratio was considered abnormal, indicating allelic imbalance, if the ratio was $0.7 or #0.3 ( Figure 2). The del(5q) samples had bone marrow myeloblast counts ranging from 0-17%, and an abnormal chromosome 5q31.2 in 5-100% of their bone marrow cell metaphases. The SNP allelic frequencies were highly correlated with the percent of abnormal metaphases in 8 del(5q) samples (r = 0.9385, p = 0.0006), indicating that the sequencing data accurately reflects clonality ( Figure 2). In contrast, the bone marrow myeloblast count is a poor surrogate for clonality. Even at blast counts less than 5 percent, clonal cells are detected using the SNP peak height skewing data, implying that the majority of bone marrow cells may be clonal even when samples have low bone marrow myeloblast counts ( Figure 2).

Gene expression levels
To assess whether del(5q) gene expression may be affected by mechanisms other than coding or splice site DNA mutations (i.e., mutations in non-coding DNA or epigenetic alterations), we compared the mRNA levels of 28 del(5q) genes in CD34+ purified cells from 6 MDS patients with del(5q), 20 MDS patients without del(5q), and 6 normal donors using the Affymetrix U133plus2 array. Four of the 28 genes did not have a probeset on the array (HNRPA0, LOC391836, LOC729429, SNORD63). The remaining 24 genes were rank ordered based on their median mRNA expression level (Affymetrix units) in the 6 control samples, and the average expression level of a gene for MDS patients with del(5q) (n = 6) and MDS patients without del(5q) (n = 20) was plotted relative to the level in control samples (Table S4, Figure 3). Nine of the 24 genes were called absent in 70-100% of all 32 samples (colored red in Figure 3). Consistent with haploinsufficiency, 7 genes had a significant reduction in average signal intensity in MDS samples with del(5q) compared to control samples (0.47 average fold change for MATR3, HSPA9, FAM53C, ETF1, CTNNA1, JMJD1B, and KLHL3 relative to control samples, p,0.04) and compared to MDS samples without del(5q) (0.52 average fold change relative to non-del(5q) samples, p,0.02 excluding FAM53C). No gene had mRNA levels in del(5q) samples consistently below the expected haploinsufficient level of 50%, indicating that biallelic loss of expression is not common in CD34+ samples containing del(5q). However, FAM53C and ETF1 were expressed at significantly lower levels in MDS samples without del(5q) compared to control samples (0.47 and 0.79 fold change relative to control, respectively, p#0.03), suggesting that these genes might be altered in the absence of a cytogenetic   deletion. To verify this possibility, we performed quantitative RT-PCR for FAM53C using available total RNA from CD34+ cells. Quantitative RT-PCR did not confirm the Affymetrix array results for haploinsufficiency of FAM53C (data not shown). Quantitative RT-PCR was not performed for ETF1 because only 1 of 2 probesets on the array showed reduced expression in non-del(5q) samples. In summary, of the 15 chromosome 5q31.2 genes with detectable expression by microarray, none was consistently expressed out of proportion to gene dosage.

Discussion
In this study, we determined by high-throughput resequencing and array comparative genomic hybridization (CGH) that point mutations and small deletions in chromosome 5q31.2 genes are not common events in de novo MDS. A critical aspect of this study design is that the nucleotide sequence and copy number for all of the genes in the commonly deleted segment were examined comprehensively in a large number of MDS patients without del(5q). These results therefore demonstrate that a cytogenetic deletion spanning the commonly deleted segment on chromosome 5q31.2 is the predominant genetic event leading to haploinsufficiency of multiple del(5q) genes. These results also imply that the simultaneous loss of multiple genes may be necessary for disease initiation and progression of del(5)(q31.2) associated MDS and AML.
Haploinsufficiency of three 5q31.2 candidate genes (HSPA9, CTNNA1, EGR1) may contribute to disease pathogenesis, but current data suggests that these genes individually do not recapitulate all features of MDS [8,16,17]. Zebrafish carrying an ENU-induced heterozygous mutation in the ortholog of the human HSPA9 gene display increased apoptosis in blood cells characteristic of ineffective hematopoiesis [16]. Look and colleagues found that decreased CTNNA1 expression in CD34+CD38-CD123+Lin-cells from high risk del(5q) associated MDS or AML patients was associated with methylation of the CTNNA1 promoter, but not with point mutations of the residual allele in primary tumor cells [8]. However, others found no reduction in CTNNA1 mRNA levels below 50% in CD34+CD38-Thy1+ cells from low risk del(5q) MDS patients [18]. Most recently, Le Beau and colleagues observed that Egr1 heterozygous mutant mice have normal resting hematopoiesis, but develop myeloid diseases after treatment with ENU [17]. Collectively, the data implicate that more than one gene is likely to be involved in the pathogenesis of MDS, and support the hypothesis that haploinsufficiency of multiple 5q31.2 genes contributes to MDS initiation or progression.
It is unlikely that a recurrent, clonally dominant, genetic event (mutation, gene silencing, or polymorphism) in these 28 genes was not detected in this de novo MDS cohort due to cellular heterogeneity or sample size. Although cellular heterogeneity in MDS bone marrow samples exists, our resequencing ( Figure 2) and array CGH (Figure 1) results confirm findings from previous studies using interphase fluorescent in situ hybridization (FISH) for del(5q), which concluded that the myeloblast count often underestimates the proportion of cells that are clonally derived in MDS [19,20,21]. In addition, by sequencing 37 MDS samples without del(5q), there is a 0.9543 probability of detecting a 5q31.2 gene mutation if it occurs in at least 8% of MDS samples without a cytogenetic del(5q). Furthermore, we found no evidence of biallelic loss of mRNA expression of del(5q) genes in CD34+ cells harvested from our MDS patients with del(5q), and we found no SNP significantly associated with MDS or evidence of preferential retention of a residual SNP allele in MDS samples with del(5q).
Collectively, our data provides direct evidence supporting the hypothesis that haploinsufficiency of multiple genes in the commonly deleted segment is the most likely relevant genetic consequence of deletions spanning chromosome 5q31.2. Moving forward and testing the hypothesis of haploinsufficiency as a disease mechanism is difficult, and using mouse models is problematic. The syntenic chromosome 5q31.2 region in mouse resides on two chromosomes, making it difficult to accurately engineer a complete knockout of all the genes at once. In addition, creating individual targeted knockout mice for each candidate gene is time-consuming, cost-prohibitive, and impractical. However, it is now possible to perform comprehensive high-throughput gene knockdown screens using RNAi technology to model haploinsufficiency because lentiviral short hairpin libraries exist [22,23]. This approach has yielded important information for the RPS14 gene located in the chromosome 5q33.1 commonly deleted segment in patients with the 5q minus syndrome [24], and this technology will expand our understanding of single gene and gene combinations that may contribute to disease initiation in MDS and AML. Ultimately, identifying the genetics events leading to MDS initiation may provide insight into targeted therapy for MDS and AML.