Transcriptional profiling of HOXA9-regulated genes in human glioblastoma cell models

The data here described pertain to the article by Pojo et al. (2015) [10] titled “A transcriptomic signature mediated by HOXA9 promotes human glioblastoma initiation, aggressiveness and resistance to temozolomide” (Pojoet al., 2015 [10]). HOX genes are part of the homeobox gene family, which encodes transcription factors crucial during embryonic development (Grier et al., 2005; Pearson et al., 2005 [6,9]) and also in post developmental regulation(Neville et al., 2002; Yamamoto et al., 2003; Takahashi et al., 2004;Morgan 2006 [8,14,13,7]). Alterations interfering with the regulation of these genes may lead to tumorigenesis in adults. Due to their contributions in the control of important cellular processes, the deregulation of HOX genes is ultimately correlated with cancer treatment failure and patients' poor prognosis (Golub et al., 1999; Abdel-Fattah et al., 2006 [5,1]; Costa et al.,2010 [4]; Pojo et al., 2015 [10]). Recently, our studies showed that HOXA9 overexpression is associated with poor prognosis in patients with glioblastoma (GBM), the most common and most malignant primary brain tumor. Mechanistically, HOXA9 is associated with resistance to chemotherapy and with pro-proliferative, pro invasive and anti-apoptotic features (Costa et al., 2010 [4]; Pojo et al., 2015 [10]) in GBM in vitro models. Since HOXA9 is a transcription factor, its target genes can be the true biological effectors of its aggressiveness. In this context, whole genome Agilent's microarrays were used to obtain the full transcriptome of HOXA9 in a variety of GBM cell models, including human immortalized astrocytes, established GBM cell lines, and GBM patient derived cell cultures. Here, we provide detailed methods, including experimental design and microarray data analyses,which can be accessed in Gene Expression Omnibus (GEO) under the accession number GSE56517. Additional interpretation of the data is included and supplemented in (Pojo et al., 2015 [10]).


Tissue culture
For this study, 4 different cell lines were used: 2 established human glioblastoma (GBM) cell lines, U87MG and U251MG, purchased from American Type Culture Collection (ATCC®); 1 cell line of human immortalized astrocytes, hTERT/E6/E7, kindly originally supplied by Dr. Russell Pieper; and 1 primary GBM culture, GBML18, established in our lab from a clinical specimen. All cell lines were cultured in Dulbecco's Modified Eagle Medium (DMEM; Gibco®) supplemented with 10% fetal bovine serum (FBS; Biochrom GmbH) and 1% penicillin-streptomycin

Contents lists available at ScienceDirect
Genomics Data j o u r n a l h o m e p a g e : h t t p : / / w w w . j o u r n a l s . e l s e v i e r . c o m / g e n o m i c s -d a t a / (Invitrogen™). Cells were maintained in a humidified atmosphere at 37°C and 5% (v/v) CO 2 .

RNA isolation, purification and cDNA quality control analysis
Total RNA was isolated from 8 × 10 5 -1 × 10 6 exponentially growing cells from a T75-flask. The TRIzol method (Invitrogen™) was used to extract the total cellular RNA from the 8 cell lines previously obtained (3 replicates from each cell line). Extracted RNA was purified using the RNeasy Plus Micro kit (Qiagen®) and quantified using the Nanodrop 2000 (under the Nucleic Acid option; Thermo Scientific, Inc.; Table 1). RNA quantification, integrity and purity were also validated using the Agilent 2100 bioanalyzer (Agilent Technologies; Table 1). All RNA samples presented A260/A280 ratios above 1.8, electropherograms with 2 distinct peaks, corresponding to the 18S and 28S ribosomal RNA, and RNA integrity numbers (RIN) above 8, as recommended for microarray analysis.
To validate that the obtained samples were a good model of HOXA9 mRNA levels modulation, HOXA9 overexpression or silencing was confirmed. To do so, 1 μg of total RNA was reverse transcribed using the High Capacity cDNA Reverse Transcriptase kit (Alfagene®), and HOXA9 quantitative PCR was performed. This cDNA was further used to validate the results obtained by microarrays.

cRNA preparation, labeling, purification and quality control analysis
Complementary RNA (cRNA) was obtained using the Low Input Quick Amp Labeling kit, One-Color (Agilent Technologies) using 200 ng of the total RNA. RNA samples were labeled with Cyanine 3-CTP and amplified together with Agilent One Color Spike-In controls, which were used as positive controls to monitor the amplification, labeling and microarray scanning. Labeled/amplified RNA was purified using the RNeasy Mini Kit (Qiagen®) according to the manufacturer's instructions. Nanodrop 2000 (under the Microarray measurement option; Thermo Scientific, Inc.) was used to obtain the cRNA and the Cyanine 3 dye concentrations, and to verify the quality of the cRNA (A260/280 ratio; Table 2). All samples presented A260/280 ratios between 2.14 and 2.33, cRNA yield N 6.08 and specific activity N8.1 (recommended to be above 1.8, 1.65 and 6, respectively).

Hybridization and washing
Labeled cRNA (1.65 μg for all conditions) was mixed according to the manufacturer's protocol and hybridized in a Whole Human Genome Microarray (G4112F, 4x44K; Agilent Technologies) at 65°C for 17 h, under 10 rpm using a hybridization rotator (Agilent Technologies). After hybridization, slides were disassembled in Wash Buffer 1 and washed once with Wash Buffer 1 at room temperature and once with Wash Buffer 2 pre-warmed overnight at 37°C according to the manufacturer's instructions.

Scanning and feature extraction
Slides were immediately scanned using the DNA Microarray Scanner with SureScan High-Resolution Technology (Agilent Technologies) using 5 different Green PMT gains (100%, 80%, 60%, 40% and 20%). The feature extraction was performed using the grid 014850_D_20070207 and the protocol GE1_107_Sep09 from Agilent. To choose the best scanning, several parameters of the feature extraction were taken into account, such as the number of saturated spikes and probes, the shape of the histogram of signals plot, the Agilent spike-in plot and the evaluation metrics. Scannings with lower number or no saturated spikes and probes, with higher number of good metrics and with the better histogram shape and spike-in plot were used for subsequent analyses.

R workflow for data processing and analysis
For gene expression microarray data processing and analysis, the limma [11] and hgug4112a.db [3] packages of the Bioconductor software platform (http://www.bioconductor.org) were used. The complete script run for each cell line is available in the Supplementary material, being here given some excerpts to illustrate the main data processing and analysis steps.

Annotation
The mapping between probe identifiers and gene symbols was done using the annotation provided by Agilent (http://www.chem. agilent.com/cag/bsp/gene_lists.asp) and the hgug4112a.db annotation package from Bioconductor [3], which provides detailed information about the hgug4112a platform. For each probe identifier, the consensus gene symbol and description were retrieved from Bioconductor packages if the annotation exists, otherwise the Agilent annotation was assumed.

Validation of microarray data by PCR
After the identification of the differentially expressed genes due to HOXA9 modulation, reverse transcriptase PCR (RT-PCR or qRT-PCR) analyses were performed to validate the microarray data in a subset of the differentially expressed target genes [10]. Specifically, selected genes for validation were RAC2, CXCL1, NDRG1 and TOX2 for hTERT/ E6/E7 cells; ICAM2, BAMBI, ANGPT2 and PDGFRB for U87MG cells; TOX2, NDRG1, RAC2 and NPR3 for GBML18 cells; and C10orf10, PDGFRB, DKK1 and SOX2 for U251MG cells.

Functional enrichment analysis
As reported in [10], due to HOXA9 expression (GEO accession number GSE56517), a total of 417 probes were significantly differentially expressed in hTERT/E6/E7 cells (166 upregulated and 251 downregulated); 3454 probes in U87MG cells (1537 upregulated and 1917 downregulated); 2452 probes in U251MG cells (1301 upregulated and 1151 downregulated); and 5886 probes in GBML18 patient-derived primary cells (2802 upregulated and 3084 downregulated; Fig. 1A). In this context, GBML18 cells were the ones with the highest number of differentially expressed transcripts, followed by U87MG, U251MG and hTERT/E6/E7. The genes EMILIN2 (upregulated in the presence of HOXA9), and MME, DIRAS1 and AGPAT3 (downregulated in the presence of HOXA9), whose roles in glioma were not yet studied, were common to the 4 GBM models (Fig. 1B). Even though, the number of genes consistently regulated by HOXA9 increases when comparing only the GBM cells (57 common transcripts; 17 upregulated and 40 downregulated). These results suggest that the transcriptome of HOXA9 is cell-type dependent.
In order to integrate the differentially-expressed genes in biologically relevant groups, they were used for Database for Annotation, Visualization and Integrated Discovery (DAVID) analyses and displayed in KEGG, GO, or Reactome pathways, or used for gene set enrichment analysis (GSEA; http://www.broad.mit.edu/gsea/). For GSEA analysis [12], gene sets databases from MSigDB C2 collection version 3 were used (available online). The permutation type used was "gene sets", while the default option was used for all other parameters. Only results with a p-value b 0.05 (for DAVID) or a false discovery rate b 0.25 (for GSEA) were considered significant. The obtained results were already published in [10]. Briefly, relevant pathways related to cellular adhesion and migration, cell cycle, DNA repair and replication, RNA processing, stem-cell phenotype, vasculature development, and immune-related pathways were shown to be enriched in the HOXA9 transcriptome. These features are known as important cancer hallmarks that influence the tumorigenic process, suggesting the putative importance of HOXA9 for GBM development, progression, and aggressiveness [10].