Integrative genomics analysis of genes with biallelic loss and its relation to the expression of mRNA and micro-RNA in esophageal squamous cell carcinoma

Genomic instability plays an important role in human cancers. We previously characterized genomic instability in esophageal squamous cell carcinomas (ESCC) in terms of loss of heterozygosity (LOH) and copy number (CN) changes in tumors. In the current study we focus on biallelic loss and its relation to expression of mRNA and miRNA in ESCC using results from 500K SNP, mRNA, and miRNA arrays in 30 cases from a high-risk region of China. (i) Biallelic loss was uncommon but when it occurred it exhibited a consistent pattern: only 77 genes (<0.5 %) showed biallelic loss in at least 10 % of ESCC samples, but nearly all of these genes were concentrated on just four chromosomal arms (ie, 42 genes on 3p, 14 genes on 9p, 10 genes on 5q, and seven genes on 4p). (ii) Biallelic loss was associated with lower mRNA expression: 52 of the 77 genes also had RNA expression data, and 41 (79 %) showed lower expression levels in cases with biallelic loss compared to those without. (iii) The relation of biallelic loss to miRNA expression was less clear but appeared to favor higher miRNA levels: of 60 miRNA-target gene pairs, 34 pairs (57 %) had higher miRNA expression with biallelic loss than without, while 26 pairs (43 %) had lower miRNA expression. (iv) Finally, the effect of biallelic loss on the relation between miRNA and mRNA expression was complex. Biallelic loss was most commonly associated with a pattern of elevated miRNA and reduced mRNA (43 %), but a pattern of both reduced miRNA and mRNA was also common (35 %). Our results indicate that biallelic loss in ESCC is uncommon, but when it occurs it is localized to a few specific chromosome regions and is associated with reduced mRNA expression of affected genes. The effect of biallelic loss on miRNA expression and on the relation between miRNA and mRNA expressions was complex.


Background
Esophageal cancer is the eighth most common and the sixth most frequent fatal human cancer in the world [1] and the fourth most common incident cancer in China [2]. Shanxi Province, a region in north central China, has among the highest esophageal cancer rates in China and nearly all of these cases are esophageal squamous cell carcinoma (ESCC). ESCC is an aggressive tumor which is typically diagnosed only after the onset of symptoms when prognosis is very poor. The 19 % 5-year survival rate is fourth worst among all cancers in the USA [3]. One promising strategy to reduce ESCC mortality is early detection. Further, a better understanding of the molecular mechanisms underlying esophageal carcinogenesis and its molecular pathology will facilitate the development of biomarkers for early detection.
Genomic instability is one of several mechanisms that can lead to gene dysregulation and has been thought to play an important role in the etiology of human cancers, including both histologic types of esophageal cancer, esophageal adenocarcinoma and ESCC [4,5]. In previous studies using a variety of different methods we found that LOH was common in ESCCs from Shanxi Province in north central China. These studies identified over 20 common LOH regions, frequent copy number alterations (both gain and loss), as well as numerous copy number neutral regions, which suggests that this cancer is characterized by genomic instability [5,6]. In addition, somatic mutations in several genes with critical roles in carcinogenesis (e.g., TP53, CDKN2A, and BRCA2) have been identified in ESCC patients with LOH in regions that include the genes [7][8][9], indicating that a wide variety of DNA alterations in numerous genes occur in the development of this tumor.
Gene expression microarray technology is an important tool for evaluating tumor heterogeneity and has been successfully applied to identify subsets of tumors (including within ESCC) with different clinical parameters such as survival, histological grade, invasive status, and response to therapy [10][11][12][13][14][15]. In recent years, miRNAs have emerged as a major class of regulatory genes, and one class of miR-NAs, conserved miRNA, has targets which can now be predicted with confidence. It is thought that the role of miRNAs is to control expression of target genes. Thus, dysregulation of miRNA is expected in human diseases such as cancer, which are attributed to dysregulation of gene expression in tumor suppressors and oncogenes [16][17][18]. Further, dysregulation of some miRNAs has been related to patient survival in some cancers, including ESCC [19,20]. Biallelic loss is thought to play a critical role in tumor pathogenesis, especially because of its influence on expression of affected genes (mRNA) and related miRNA, however, these relations have not been well studied in ESCC. Since cancer is a complex disease, it is increasingly important that analyses combine the evaluation of alterations in DNA and RNA, including those that occur in both mRNA and miRNA, in order to better understand their potential interactions in the development of cancers.
It has become clear that human genetic variation ranges from single nucleotide changes at the sequence level up to multi-megabase chromosomal aberrations. Of the molecular genetic changes that occur during the development of human cancer, alterations in SNPs are likely among the earliest or even the initial events that lead to genomic instability. While studying large changes (eg, big deletions, inversions, and translocations at the chromosomal level) in tumor cells is informative, knowledge of the more numerous small alterations that occur at the nucleotide sequence level are equally or more critical to our understanding of the detailed process of carcinogenesis, particularly at its earliest stages. For example, biallelic loss in genes may cause double strand breaks which result in widespread structural rearrangements of the genome. However, how alterations of DNA (i.e., biallelic loss) influence gene expression remains largely unknown, especial for SNPs that are not in coding regions.
In the present study, we performed global profiling of alterations in DNA as well as expression of mRNA and miRNA in tumors and their matched normal tissues from 30 ESCC cases. Using these profiles, we identified genes with biallelic loss and examined their mRNA expression and miRNA targets as an initial step in understanding relations among these small alterations in nucleic acids in ESCC.

Case selection
This study was approved by the Institutional Review Boards of the Shanxi Cancer Hospital and the US National Cancer Institute (NCI). Briefly, cases diagnosed with ESCC between 1998 and 2001 in the Shanxi Cancer Hospital in Taiyuan, Shanxi Province, PR China, and considered candidates for curative surgical resection were identified and recruited to participate in this study after obtaining written informed consent. None of the cases had prior therapy and Shanxi was the ancestral home for all. Cases ranged in age from 39 to 67 years (median 56 years) and were predominantly female (63 %). Clinically, most cases had Stage 2 (77 %) cancers and half had evidence of metastasis at diagnosis. The ESCC cases studied here were previously evaluated for LOH and copy number alterations using genome-wide arrays [5,6].

Biological specimen collection and processing
Venous blood (10 ml) was taken from each case prior to surgery and germline DNA from whole blood was extracted and purified using the standard phenol/chloroform method. Tumor and adjacent normal tissues were dissected at the time of surgery and stored in liquid nitrogen until used. DNA was extracted from micro-dissected tumor as previously described [5] using the protocol from the Puregene DNA Purification Tissue Kit (Gentra Systems, Inc., Minneapolis, MN).
RNA was extracted from 17 of the micro-dissected tumors and their matched normal tissue pairs as described previously using the protocol from the PureLink Microto-Midi Total RNA Purification System (Catalog number 12183-018, Invitrogen, Carlsbad, CA) [5]; total RNA from 13 cases was isolated by using the Allprep kit (Qiagen) per the manufacturer's instructions. RNA quality and quantity were determined using the RNA 6000 Labchip/Agilent 2100 Bioanalyzer (Agilent Technologies, Germantown, MD).

Target preparation for GeneChip human mapping 500K array set
The Affymetrix GeneChip Human Mapping 500K array set was previously performed in these patients (6,7). The set array contains~262,000 (Nsp I array) and~238,000 (Sty I array) SNPs (mean probe spacing = 5.8Kb, mean heterozygosity = 27 %). A detailed gene chip protocol can be found at http://www.affymetrix.com/support/ downloads/manuals/500k_assay_manual.pdf.
Experiments were conducted according to the protocol (GeneChip Mapping Assay manual) supplied by Affymetrix, Inc. (Santa Clara, CA). Genotype calls were generated by GTYPE v 4.0 software (Affymetrix). Paired germ-line and tumor DNA from each case were run together in parallel in the same experiment (ie, same batch, same day). The GEO accession number for these SNP array data is GSE15526.
Probe preparation and hybridization for Human Genome U133A 2.0 array The Affymetrix Human Genome U133A 2.0 array is a single array used to interrogate expression of 14,500 well-characterized human genes. Array experiments were performed using 1-5ug total RNA for each array as described previously [10]. We followed the protocol provided by the manufacturer to carry out reverse transcription, labeling, and hybridization. (http://www.affymetrix.com/support/technical/manual/expression_manual.affx). RNA from paired tumor and normal esophageal tissues were run together in parallel in the same experiment. The GEO accession number for these expression array data is GSE38129.

ABI miRNA expression array by RT-PCR
The TaqMan® Low Density Array was used to determine microRNA expression in this study, which employed the 9700HT fast real-time PCR system from ABI. Comprehensive coverage of Sanger miRBase v14 was enabled via a two-card set of TaqMan® Array MicroRNA Cards (Cards A and B) for a total of 754 assays specific to 664 unique human miRNAs. In addition, each card contains one selected endogenous control assay (MammU6; printed four times), five human endogenous controls (RNU 6B, 24, 43, 44, 48) that are the most highly abundant and stably expressed across all tissues, and one negative control assay (ath-miR159a). Card A focused on more highly characterized miRNAs, while Card B contained more recently discovered miRNAs along with the miR* sequences.
RNA from paired tumor and normal esophageal tissues were run together in parallel in the same experiment. The protocol followed the manufacturer' s manual at http:// www3.appliedbiosystems.com/cms/groups/mcb_support/documents/generaldocuments/cms_042167.pdf. Briefly, three uL of total RNA (350-1000ng) was added to 4.50uL of RT reaction mix, which consisted of 10x Megaplex RT Primers, 100mM dNTPs with dTTP, 50U/uL MultiScribe Reverse Transcriptase, 10x RT buffer, 25mM MgCl 2 , 20U/uL RNase Inhibitor, and nuclease-free H 2 O. The samples were run on a thermal cycler using the following conditions: 40 cycles of 16°C for two min, 42°C for one min, and 50°C for one sec. All reactions were completed with a final incubation of 85°C for five min. Six uL of cDNA generated from the thermal cycler was mixed with 450uL of 2x TaqMan Universal PCR Master Mix with no AmpErase UNG, and 444uL of nuclease-free H 2 O. 100uL of the reaction mix was added to each of eight fill ports on the TaqMan MicroRNA Array. The filled Array was centrifuged twice at 1200 rpm for one min, and then sealed with the eight fill ports removed. The array was run on the 7900HT RT-PCR System with SDS software. The comparative CT method was used to determine the expression levels of mature miRNAs. The GEO accession number for these miRNA data is GSE66274.

GeneChip 500K array data analysis
Probe intensity data from Affymetrix 500K SNP arrays were used to identify DNA alterations in the present study. To avoid gender-related issues, SNPs mapped to either the X or Y chromosome were excluded. Affymetrix SNP array data were first normalized using the gtypeprobe set-genotype package included in Affymetrix Power Tools version 1.85. Each tumor sample was individually normalized via the BRLMM algorithm along with 99 blood samples. These blood samples were obtained from the 30 ESCC cases evaluated in the present study plus 69 healthy controls (age-, sex-, and region-matched to the cases) who were all part of a larger case-control study of upper gastrointestinal (UGI) cancers conducted in Shanxi Province [21]. Biallelic loss, including loss of both alleles in heterozygotes as well as homozygous deletions, was determined based on comparison of matched tumor versus germline DNA. Several criteria were used to determine biallelic loss as follows: 1) a SNP with biallelic loss must have (a) a "No Call" genotype call in the tumor sample; (b) a high quality genotype call in the normal sample; and (c) reduced copy number (CN0 or CN1); 2) analysis was limited to SNPs in genes (exons and introns only); and 3) analysis was limited to SNPs that fulfilled elements from criterion #1 (a) to (c) in at least 10 % of the 30 ESCC cases studied. Analyses of LOH and CN were described previously [5,6].
Human genome U133A 2.0 array data analysis and relation between biallelic loss and mRNA expression For all of the Affymetrix U133 array data, raw data sets (CEL files on all samples) after scanning were normalized using RMA as implemented in Bioconductor in R (http:// www.bioconductor.org), including background correction and normalization across all samples. For each sample, log2 fold changes in gene expression were calculated by subtracting the adjacent normal RMA value from the corresponding tumor RMA value.
To assess the influence of biallelic loss on expression, we performed the following steps: (i) First, genes assayed by the U133A microarray were mapped onto each biallelic loss segment of each sample. Map locations of genes were taken from the Affymetrix version na29 microarray annotation file. (ii) We then performed two-sided unpaired Wilcoxon rank sum tests comparing the log2 fold changes for a probe set in biallelic loss positive and negative samples. A P-value <0.05 was considered significant. (iii) Finally, SNPs on the 500K microarray were mapped to the reference sequence for each expression probe set. Average fold changes were used to relate mRNA expression to DNA biallelic loss.
ABI miRNA expression array analysis RQ Manager integrated in software from ABI was used to normalize the entire signal generated. Expression level (as fold change) was calculated when both tumor and normal samples had signals in the assays using DataAssist software v2.0 (Life Technologies, http://www.lifetechnologies.com/about-life-technologies.html). Signals for miRNA that showed either in tumor only or normal only were dropped from analysis. Fold change was calculated using the 2 -ΔΔCT method. In the present study, the data are presented as fold change in the target gene expression in tumors normalized to the internal control gene (MammU6) and relative to the normal tissue control (matched normal as calibrator). Results of the real-time PCR data are represented as C T values, with C T defined as the threshold cycle number of PCRs at which amplified product was first detected. The average C T was calculated for both the target gene and MammU6 and the ΔC T was determined as (the mean of up to three C T values for the target gene) minus (the mean of the C T values for U6). The ΔΔC T represented the difference between the paired tissue samples, as calculated by the formula ΔΔC T = (ΔC T of tumor -ΔC T of normal). The N-fold differential expression in the target gene of a tumor sample compared to its normal sample counterpart was expressed as 2 -ΔΔCT . For each case, the frequency of dysregulated miRNAs was calculated as the number of dysregulated miRNAs divided by the total number of miRNAs that showed signals in both tumor and normal. The criteria used to call an miRNA dysregulated were fold changes ≥ 2 or ≤ 0.5.
We used median fold change for both miRNA and mRNA in our analysis of the relation between expressions of miRNA and genes. Correlations and p-values between selected variables were performed using Spearman rank correlations and Wilcoxon rank tests.

Results
A flow diagram detailing the various laboratory analyses performed in the study can be found in Additional file 1: Figure S1.

Genes with frequent biallelic loss
The overall average genotype call rate was 95 % in the present study for the 60 chips evaluated: average call rates for the 250K Nsp chip were 95 % for both germline DNA (range 93-98 %) and tumor DNA (range 91-97 %), and average call rates for the 250K Sty chip were 96 % (range 90-98 %) for germline DNA and 95 % (range 92-97 %) for tumor DNA.
We identified 702 SNPs that showed frequent biallelic loss, that is, in at least 10 % (at least three cases) of ESCC tumors (see "Methods" section). Those 702 SNPs mapped to 77 genes and represent 9.4 % of the total of 7484 SNPs in those genes on our SNP array. Nearly all of the 77 genes represented by these SNPs were concentrated on just four chromosomal arms (ie, 42 genes on 3p, 14 on 9p, 10 on 5q, and 7 on 4p). Table 1 summarizes biallelic loss frequencies for each of the 77 genes, including the number of cases with biallelic loss, the number of SNPs with biallelic loss in at least three cases, the number of SNPs mapped within the gene and present on the SNP array, and the fraction of the SNPs with biallelic loss among all the SNPs in the gene. Some of the genes shown in Table 1 are known cancerassociated genes (ie, FOXP1, CSMD1, CDKN2A/2B, We also examined our ESCC cases by the frequency of biallelic loss frequency among the 7484 SNPs in the 77 genes with biallelic loss ( Table 2). Fourteen cases had at least 100 SNPs with biallelic loss and were termed "higher biallelic loss cases", whereas the remaining 16 cases had fewer than 100 SNPs with biallelic loss and were called "lower biallelic loss cases." Table 2 summarizes DNA changes (biallelic loss, LOH, and DNA copy number alterations) among the 7848 SNPs in the 77 genes with biallelic loss for each of the 30 ESCC cases. The number of SNPs affected by LOH and copy number alterations varied widely among cases. The number of SNPs with biallelic loss was highly correlated with both the number of SNPs with LOH (r = 0.92, p = 4.41E-13) and the number of SNPs with copy number loss (r = 0.97, p = 3.94E-19), but was not significantly correlated with either the number of SNPs with copy number gain (r = 0.21, p = 0.26) or the number of SNPs with copy number neutral LOH (r = 0.18, p = 0.34). We also checked for microdeletions or biallelic loss regions on chromosomes (eg, 3p14) caused by continuous SNPs with biallelic loss, but were unable to identify any.

Biallelic loss and expression of mRNA and miRNA
Among the 77 genes with biallelic loss, 52 had probes represented on the Affymetrix Hu 133 array with signals in both tumor and normal tissues. We found that 41 of these 52 genes (79 %) had lower mRNA expression levels in cases with biallelic loss than cases with no biallelic loss (the 41 genes shown in the unshaded area on the left in Fig. 1), including eight genes in which mRNA expression was statistically significantly lower (Additional file 2: Table  S1). For example, the mean fold change in MTAP expression was 0.83 in cases with biallelic loss versus 1.11 in cases with no biallelic loss (P = 0.009). Eleven genes had expression levels that were the same or higher in cases with biallelic loss than cases with no loss (the 11 genes in the shaded area on the right in Fig. 1). As an example, the median fold change for expression of ADAMTS9 was 1.35 in cases with biallelic loss compared to 1.0 in cases without biallelic loss (Additional file 2: Table S1). Although expression differences observed were modest, these results suggest that biallelic loss appeared to influence gene expression.
Expression of miRNAs by biallelic loss status is shown in Additional file 3: Table S2. The ratio of miRNA expression in target genes with biallelic loss (versus without) was greater than one for 34 of 60 (57 %) miRNAs and less than one in 26 of 60 (43 %).

miRNA expression and target gene expression in genes with frequent biallelic loss
There were a total of 664 miRNAs on the ABI Chips A & B in our analysis. Two hundred sixty-eight miRNAs were excluded from further analysis because of inadequate data (ie, signal was present only in tumor or only in normal, or signal was absent altogether because of tissue specificity), leaving 396 miRNAs that showed signals in both the tumor and the normal tissues in at least 10 % of the 30 ESCC cases (ie, at least three cases) for our analyses (Additional file 4: Table S3 and Additional file 5: Table S4). We checked the conserved miRNA targets in the 3′ UTRs for the 52 genes with mRNA results using http:// www.targetscan.org/ and found 44 genes that could be targeted by one or more of the miRNAs present on the ABI miRNA array. After further filtering (ie, seven genes were not targeted by miRNAs on our array, and two genes had miRNA signals in less than three cases), we had data available to analyze the relation between the expression of 70 miRNAs (58 targeted just one gene and 12 targeted multiple genes) and 35 gene targets in our 30 ESCC cases (Additional file 6: Table S5). We found a relatively wide range of miRNA expression levels among these gene targets (tumor:normal miRNA fold change median = 1.62, range 0.08 to 5.27; Additional file 7: Table S6). Overall, in the 35 genes with biallelic loss that were evaluable, miRNA expression levels were more often elevated (fold change > 1.0) than mRNA expression levels (69 % versus 14 %, respectively). When miRNA and mRNA were examined together, expression levels of 41 (of 70) miRNAs were elevated (fold change > 1.0) while their target gene expression levels were reduced (fold change < 1.0). Examples of interesting miRNA-target gene pairs that showed this pattern were: expression of miR-205 was 3.11-fold and its target gene ADAMTS9 expression was 0.87-fold; miR-124 was 3.79-fold and its target gene SUCLG2 was 0.42-fold; and expression of miR-183 was 3.01-fold while its target FOXP1 was 0.76-fold. Conversely, 17 miRNAs showed reduced expression in both the miRNA and its target gene. An illustrative example of this relation is the 0.17-fold expression change for miR-133b0.85-fold in its target gene IQGAP expression. Taken together, our results indicate that miRNA expression levels varied widely. Increased miRNA most often was associated with reduced target gene expression, but reduced target gene expression was also frequently seen with reduced miRNA in ESCC.

Discussion
In the present study, we took an integrative approach by evaluating genes in relation to biallelic loss and expression of both mRNA and miRNA in ESCC cases using profiling data generated from arrays. We made several observations from our evaluation of these data. First, biallelic loss was relatively uncommon, but when it occurred it was concentrated in four chromosome arms, namely, 3p, 9p, 5q, and 4p. Second, biallelic loss appeared to affect gene expression; nearly 80 % of genes in cases with biallelic loss showed reduced mRNA expression compared to those without loss. Third, although the relation was less clear than for mRNA, biallelic loss also appeared to affect miRNA expression. More informative future studies will need larger sample sizes so as to have more heterozygous SNPs for evaluation, and appropriate coverage of promoter regions, to confirm and expand our findings here regarding relations between SNPs with biallelic loss and expression of mRNA and miRNA.
Distributions of miRNA expressions were generally higher in cases with biallelic loss than in cases without loss. Of 60 miRNA-target gene pairs, 34 pairs (57 %) had higher miRNA expression with biallelic loss than without, while 26 pairs (43 %) had lower miRNA expression. Finally, the effect of biallelic loss on the relation between miRNA and mRNA expression was complex. Biallelic loss was most commonly associated with a pattern of elevated miRNA and reduced mRNA (43 %), but a pattern of both reduced miRNA and mRNA was also common (35 %).
In our study, 77 genes showed biallelic loss and half of these genes were located on chromosome 3p, our most common site of loss. The frequency of biallelic loss among the 42 genes on chromosome 3p ranged from 5 % to 64 % (Table 1). This region includes several tumor suppressor genes (eg, ROBO1, FOXP1, FHIT). Our previous studies also showed high frequency of LOH on chromosome 3p in ESCC [5]. Taken together, biallelic loss, similar to LOH, appears to play a role in the stability of chromosome 3p in ESCC. Although most SNPs that show biallelic loss and/or LOH are located in the non-coding regions of these genes, they may exert their effects via gene expression. For example, biallelic loss of CHL1 (3p26.3) affected six of 78 SNPs, and the expression level of CHL1 was reduced in cases with biallelic loss but elevated in cases without biallelic loss (Additional file 2: Table S1), while miRNA-10a and miRNA-10b expression levels, which both target CHL1, were higher in cases with biallelic loss than those without (Additional file 3: Table S2). Interestingly, a previous study showed that miR-10b may play a causal role in inducing metastatic behavior [22]. We note that expression levels for some genes did not show big differences by biallelic loss status, or even higher expression levels in cases with biallelic loss than those without. One potential explanation for this finding is epigenetic alterations such as DNA methylation changes.
Our results indicate that ESCC tumors may be divided into two groups: those with high and those with low levels of genome instability as assessed by the number of SNPs with biallelic loss ( Table 2), suggesting that the genetic stability is variable, even among patients from a seemingly homogeneous population with extraordinarily high rates of esophageal cancer. Our results also show that both miRNA and mRNA levels varied widely despite the fact that the esophageal cancer patients studied here were similar in many important ways (eg, from the same geography, had the same tumor histology, and had similar clinical characteristics such as stage) [5], suggesting that ESCC is incredibly complex and that there is significant heterogeneity among ESCC patients. This is likely one reason why tumors that appear similar can progress and respond to therapy in dramatically different ways. New insights gained from better understanding of case/ tumor heterogeneity should be useful for predicting response to therapy [23,24].
Although several studies have evaluated biallelic loss/ homozygous deletion within specific genes in tumors (e.g., CDKN2A [25]), including ESCC [26], using techniques such as FISH, only a few prior reports have used genome-wide techniques such as SNP or comparative genomic hybridization (CGH) arrays to agnostically assess homozygous deletions in tumor cell lines or tissues. Cancers evaluated for biallelic loss with array technology include prostate (SNP array) [25,27], B-cell lymphomas (CGH array) [28], and B-cell chronic lymphocytic leukemia (SNP array) [29]. Among the largest of these studies, Guichard et al. used CGH arrays and recently reported that 40 % of 125 hepatocellular carcinomas had homozygous deletions [30]. Twelve regions were recurrently altered, including most frequently loci at CDKN2A-CDKN2B (6.4 %), AXIN1 (3.2 %), and IRF2 (3.2 %). To the best of our best knowledge, our study