A ceRNA approach may unveil unexpected contributors to deletion syndromes, the model of 5q- syndrome.

In genomic deletions, gene haploinsufficiency might directly configure a specific disease phenotype. Nevertheless, in some cases no functional association can be identified between haploinsufficient genes and the deletion-associated phenotype. Transcripts can act as microRNA sponges. The reduction of transcripts from the hemizygous region may increase the availability of specific microRNAs, which in turn may exert in-trans regulation of target genes outside the deleted region, eventually contributing to the phenotype. Here we prospect a competing endogenous RNA (ceRNA) approach for the identification of candidate genes target of epigenetic regulation in deletion syndromes. As a model, we analyzed the 5q- myelodysplastic syndrome. Genes in haploinsufficiency within the common 5q deleted region in CD34+ blasts were identified in silico. Using the miRWalk 2.0 platform, we predicted microRNAs whose availability, and thus activity, could be enhanced by the deletion, and performed a genomewide analysis of the genes outside the 5q deleted region that could be targeted by the predicted miRNAs. The analysis pointed to two genes with altered expression in 5q- transcriptome, which have never been related with 5q- before. The prospected approach allows investigating the global transcriptional effect of genomic deletions, possibly prompting discovery of unsuspected contributors in the deletion-associated phenotype. Moreover, it may help in functionally characterizing previously reported unexpected interactions.

Frequently, specific genomic rearrangements are associated with specific malignant phenotypes.
One notable example is the Philadelphia chromosome being the hallmark of chronic myelogenous leukemia (CML). Philadelphia chromosome is the result of a balanced translocation involving chromosomes 9 and 22 [9]. The final product of this rearrangement is the production of a chimerical protein (p210 BCR-ABL ) with constitutive tyrosine kinase activity, which is responsible for the CML clone expansion [10].
The study of the effects of translocations usually leads to the identification of genes at the breakpoints that gain or lose functions and that are causative of the phenotypes observed.
Differently, the study of genomic deletions or duplications is less straightforward because of the rarity of homozygous deletions and since the involved regions are commonly gene-rich.
The main efforts to elucidate the effects of deletions have been focused on the study of every single gene coded within the deleted region. The rationale is that if a gene shows haploinsufficiency, a reduced amount and activities of the gene products can contribute to the phenotype [11- For example, in Williams syndrome the 7q11.23 band is deleted. The deleted region includes more than 25 genes, comprising the ELN gene. ELN gene codes for the elastin protein, and its haploinsufficiency is associated with the typical cardiovascular abnormalities of the syndrome [36].
Interestingly, very often this axiomatic relationship between genic deletion and phenotype is not easily identified. This is the case of 5q-syndrome [9], which we have adopted as a model to test a novel in silico approach to investigate the global effect of deletions. The 5qsyndrome is a hematological disorder characterized by the loss of the 5q31.1 band in bone marrow hematopoietic cells. This chromosome abnormality usually leads to a myelodysplastic syndrome (MDS) that can also evolve towards acute myeloid leukemia (AML) [37][38][39][40][41][42]. In the commonly deleted region of 5q-, several genes have been suggested to play a role in the syndrome, such as SPARC, RPS14 and hsa-miR-145, all of them contributing to specific features of the 5q-myelodysplasia. However, the reduced activity of such genes does not explain every facet of the complex phenotype of 5q-syndrome [37][38][39][40][41][42].
In order to investigate the global effect of the 5q deletion, we prospected a competing endogenous RNAs (ceRNA) approach. CeRNA rationale relies on the consideration that RNA transcripts regulate one another by competing for shared microRNAs [43][44][45][46][47][48]. The loss or haploinsufficiency of a specific gene can free a certain amount of regulating microRNAs that, in turn, can act in-trans to regulate a subset of other transcripts. CeRNA approach has given interesting results both in oncological and non-oncological diseases. Usually, competing RNAs are explored using a single bait gene, as in the case of PTEN [49,50], LMNA [51,52], SOX2 [53], hTERT [54]. To the best of our knowledge, the effect of the loss of a pool of genes, as in the case of a deletion, using a ceRNA approach, has never been investigated before.
In 5q-syndrome, we selected, by in silico analyses, a set of microRNAs that might be freed by the haploinsufficiency of the genes coded within the deleted region, and identified the genes that could be regulated by the microRNA set as a whole.
This approach, which extends the research for ceRNAs from a single bait gene, to a set of genes, allows identifying those genes whose activity can be perturbed by a genomic deletion, considered as a whole. Notably, it could provide an explanation to the phenotypes observed in syndromes caused by deletions, independently from the genes coded within the deleted region.

RESULTS AND DISCUSSION
Over the last few years, it has become clear that different RNA species can cross-talk and regulate one another [43][44][45][46][47][48][55][56][57][58][59]. Due to a hemizygosis condition, several species of RNAs can be downregulated compared to a wild type condition. This global loss of transcripts might have an impact on the RNA-mediated cellular systems of regulation. In particular, we investigated if this loss of transcripts could have an impact on microRNAmediated systems of regulation. MicroRNAs are small non-coding RNAs that regulate the gene expression, mostly at a post-transcriptional level [60][61][62][63][64]. In particular, RNA transcripts can regulate one another by competing for shared microRNAs. The RNAs that regulate one another in this way are called competing endogenous RNAs or ceRNAs. Competitive endogenous RNAs cross-regulation involves sequestration of shared microRNAs and gives rise to rather complex regulatory networks [43][44][45][46][47][48].
The loss of several transcripts at once during a deletion might free a sufficient amount of microRNAs that can assert a detectable effect in trans outside from the deleted region. Moreover, if several genes in haploinsufficiency within the deleted region are regulated by the same set of microRNAs, we might be able to identify a deletion-specific signature characterized by an increased activity of specific microRNAs.
In brief, a deletion could have an impact on the activity of a specific set of microRNAs that may in turn alter the activity of genes outside the deleted region and apparently unrelated with the genomic deletion. This alteration might contribute to determine the phenotypes of deletion syndromes.
In order to test our hypothesis, we used the 5qsyndrome model to investigate if a ceRNA approach could be useful to identify unexpected contributors to deletion syndromes. The approach adopted is graphically summarized in Figure 1.
We took advantage of the published GDS3795 affymetrix array dataset [65], which collects the global gene expression profiling of bone marrow CD34+ cells of myelodysplastic syndrome patients and healthy controls.
We first identified the patients with 5q deletion as the only reported genomic abnormality (see Supplemental data), and then, using their expression data, we selected the genes in haploinsufficiency within the common 5q deleted region, which are listed in Table 1.
Using the bioinformatics approach described in the Methods section, we identified a set of microRNAs that putatively regulate the genes in haploinsufficiency. Each gene was regulated by a different set of microRNAs, but overall some microRNAs regulate a larger set of genes. Organizing those microRNAs in a hierarchical order, we were able to identify the most represented microRNAs. MicroRNAs that putatively regulated at least 5 of the haploinsufficient genes within the 5q-deleted region were selected. These included: hsa-miR-3164, hsa-miR-513a-5p, hsa-miR-30c-1-3p, hsa-miR-1254, hsa-miR-3916, hsa-miR-27a-3p, hsa-miR-27b-3p, hsa-miR-4311, and hsa-miR-665 (see Supplemental data).
We then looked for genes that were predicted to be A comparison of the normalized expression levels, as reported in GDS3795 dataset, of the genes within the common deleted region between 5q-patients and controls. The genes that showed in a student's t-test a statistically significant (p<0.05) reduction of expression levels in 5q-specimens were considered as haploinsufficient genes. SE= Standard Error. SD= Standard Deviation. CNTR= Controls.
regulated by all of the 9 microRNAs, the rationale being that if these microRNAs could not bind a fraction of their natural targets due to the haploinsufficiency of the 5q-coded genes, they were free to exert their control on the remaining targets, deregulating the control network. The analysis pointed out 4 genes, namely DCX, GRAMD1B, HIPK2 and SLC1A2, which were putatively regulated by all the 9 microRNAs. Among these genes, GRAMD1B and HIPK2, showed significantly different mRNA expression in 5q-CD34+ cells as compared with control CD34+ cells in the same GDS3795 dataset, being significantly down-and up-regulated, respectively. Of note, the two genes that did not show significant variation between 5q-and control CD34+ cells both showed very low expression levels ( Table 2).
GRAMD1B codes for a protein involved in chemoresistance [66] and the rs735665 SNP upstream of its coding sequence has been associated with chronic lymphocytic leukemia (CLL) in a genome-wide association study [67,68].
HIPK2 is part of the AML1 complex, and it activates its transcriptional activity. Noteworthy, AML1 is a frequent target of leukemia-associated mutations. It has been reported that HIPK2 mutations in AML and MDS impair AML1-mediated transcription. It has been therefore suggested that a deregulation of HIPK2 may play a role in the pathogenesis of leukemia [69].
The results obtained through the approach prospected herein have, however, some limitations. Indeed, the algorithms used to identify the interactions A comparison of the normalized expression levels, as reported in GDS3795 dataset, of the putative ceRNA genes between 5q-patients and controls. The genes that showed in a student's t-test a statistically significant (p<0.05) difference of expression levels are considered as positive results. SE= Standard Error. SD= Standard Deviation. CNTR= Controls. between transcripts and microRNAs are still imperfect. Even if the criterion used was highly stringent (i.e. the contemporary detection by 4 of the most used algorithms) the certainty of the result is far from being achieved. The adoption of different algorithms and/or different parameters could lead to different results. Similarly, the adoption of a different, and less stringent threshold in the selection of microRNAs could have led to different results.
Finally, the interpretation of the results is rather complex. If microRNAs can act only as inhibitors of transcription and translation, the ceRNAs isolated through this analysis should have been consistently downregulated. Instead, the analysis identified HIPK2 that is significantly upregulated in the 5q-patients as compared with controls. It is known that microRNAs can also upregulate the transcription [70][71][72], and maybe that is the case. Alternatively, HIPK2 upregulation could be the result of complex perturbations of the RNA regulatory network.
Nevertheless, the analysis that we prospected was able to pinpoint two genes significantly modulated in patients, as compared with controls, and whose relationship with 5q-deletion was never reported before.
The method prospected here represents a novel approach to study the global effects of genomic deletions with the final aim of identifying unexpected contributors to the genomic deletion phenotypes and could deserve experimental validation. The same approach might be used to study duplications or complex rearrangements, leading to a new strategy to question complex syndromes and phenotypes that at the moment are not fully understood.
The expression levels of the genes within the common deleted region between 5q-patients and controls were compared. The genes that showed in a student's t-test a statistically significant (p < 0.05) reduction of expression levels in 5q-specimens were considered as haploinsufficient genes ( Table 1). As expected, no genes showed an increase in expression levels.
The miRWalk 2.0 [74] platform was used to identify the microRNAs that putatively regulate the genes in haploinsufficency. We considered as positives the microRNAs that recognize the 3'UTR of the genes, with a minimum seed length of 7 and from miRNA seed position 1, with a maximum of p-value of 0.05 in all of the 4 algorithms embedded in the platform used during the analysis: miRWalk [74], miRanda [75], RNA22 [76] and TargetScan [77]. If a gene was recognized multiple times by the same microRNA, it was considered as a single hit in the following analyses. This collection of microRNAs was then organized in a hierarchical order from the most present to the less, and only microRNAs that putatively regulated 5 or more genes in haploinsufficiency were selected for the following analyses. The threshold of 5 was selected to harvest a sufficient number of microRNA to continue the analysis, ideally in the range of the number of microRNAs that can control a single gene, from 4 to 20 [78] (Supplemental data).
Using the miRWalk 2.0 platform with the same parameters described above, the genes putatively regulated by this pool of microRNAs were identified, and those genes that resulted regulated by all the microRNAs were selected. The expression levels of the candidate genes were then analyzed in the same samples from the GDS3795 dataset.