A comprehensive investigation of histotype-specific microRNA and their variants in Stage I epithelial ovarian cancers

isomiRs, the sequence-variants of microRNA, are known to be tissue and cell type specific but their physiological role is largely unknown. In our study, we explored for the first time the expression of isomiRs across different Stage I epithelial ovarian cancer (EOC) histological subtypes, in order to shed new light on their biological role in tumor growth and progression. In a multicentric retrospective cohort of tumor biopsies (n = 215) we sequenced small RNAs finding 971 expressed miRNAs, 64% of which are isomiRs. Among them, 42 isomiRs showed a clear histotype specific pat-tern, confirming our previously identified miRNA markers (miR192/194 and miR30a-3p/5p for mucinous and clear cell subtypes, respectively) and uncovering new biomarkers for all the five subtypes. Using integrative models, we found that the 38% of these miRNA expression alterations is the result of copy number variations

while the 17% of differential transcriptional activities. Our work represents the first attempt to characterize isomiRs expression in Stage I EOC within and across subtypes and to contextualize their alterations in the framework of the large genomic heterogeneity of this tumor.

K E Y W O R D S
biomarker, histotype, miRNA, ovarian cancer, sequencing

Whats' new?
Little is known about the molecular characteristics that distinguish the subtypes of Stage I ovarian cancer. The expression profile of microRNA (miRNA) can be used to distinguish certain subtypes and also help predict survival. Here, the authors cataloged the array of miRNA variants, called iso-miRs, expressed in a cohort of Stage I epithelial ovarian cancers and investigated their biological role in tumor growth and progression. Different tumor subtypes could be distinguished by the collection of isomiRs they expressed, and many of these isomiRs result from copy number variations and changes in transcription factor activity, suggesting a role in tumor evolution.

| INTRODUCTION
Epithelial ovarian cancer (EOC) is one of the most lethal gynecologic diseases, with survival rate virtually unchanged over the last 30 years.
Patients with disease limited to the ovary (FIGO, Stage I) represent 10% to 20% of all EOC and have a 5-years survival rate close to 80%.
In contrast, when the disease is diagnosed in the upper abdomen or beyond (FIGO stages III and IV), that is the majority of cases, the survival rate is lower than 20%. 1 Although large efforts have been made to study the molecular heterogeneity of Stage III/IV, Stage I tumors have been largely overlooked, mainly because of their infrequent nature-which limits recruitment of sizable cohorts of cases-and their histological heterogeneity. Stage I encompasses five distinct histological subtypes (high grade serous, HGSOC; low grade serous, LGSOC; clear cells, OCCC; mucinous, MOC; endometrioid, EC), with different tissue of origin, peculiar genetic lesions, response and outcomes to standard treatments. 2 On a sizable and unique cohort of Stage I EOC our group has investigated the molecular differences that characterize each subtype and its prognosis. We demonstrated that: (i) the expression profile of a subset of miRNA acts as histotype specific biomarker for clear cell and mucinous subtypes (miR-30a-5p/3p and miR-192/194, respectively) 3 ; (ii) miR-200c is an independent prognostic biomarker of survival 4 and it is part of a more complex network composed of miRNAs, coding genes and lncRNAs which together predict patients relapse better than conventional classifiers based on anatomo-pathology 5-7 ; (iii) each subtype can be further subclassified into three genomic profiles based on the length and the extent of somatic copy number alteration. 8 Overall, these findings suggest that the biology of Stage I EOC is more complex than depicted by histopathological data, and that a more detailed knowledge of its molecular events is mandatory to improve patients' clinical outcome.
Several novel miRNA sequence variants (isomiRs) have been recently detected alongside their canonical counterparts with the massive use of next generation sequencing technology. Many studies demonstrated that isomiRs are physiological events occurring in both normal and pathological tissues. IsomiRs results from enzyme mediated RNA sequence modifications and differs from the canonical sequence in length, sequence or both. 9 A recent classification categorizes isomiRs into at least two classes. The first is characterized by the presence of single nucleotide polymorphisms (SNPs) in the canonical miRNA sequences. This modification can cause changes in miRNAs expression, processing, and functions. [10][11][12][13] The second is characterized by alterations at the 5 0 and 3 0 ends generated during the miRNA maturation process. 9,14-19 These small molecular changes in length and sequence could be responsible for seed modification, which ultimately results in a possible "targetome shifting." [20][21][22] Although functional data are largely incomplete, isomiRs are expected to influence miRNAs stability 23,24 and their subcellular localization, 25,26 with their function not completely overlapping with that of their canonical counterparts.
Given their regulatory functions in key cellular processes and their tissue-specificity, a more comprehensive knowledge of isomiR expression in tumor seems desirable.
In our study, we profiled the entire repertoire of miRNAs and iso-

| RNA-seq data preprocessing
Reads were trimmed using Atropos, 29 removing the adapters only when the overlap between the adapter and the read was at least eight bases.
Only reads with at least 17 bp were used for the following analyses.
The raw counts were annotated using isomiRs R package, assigning to each isoform a specific name following this nomenclature:

| Differential analysis and biomarker detection among the five Stage I EOC histotypes
To identify the differentially expressed miRNA (hereafter called DEM) we used the edgeR R package. 35 We provided as input the filtered raw counts with the design matrix defined by the dicotomous variables for the histotypes and the three estimated RUV factors for data normalization and batch correction (see Section 2.4). False discovery rate (FDR) less than 0.05 was used to select significant miRNA.
To identify among the list of DEM the biomarker miRNA (defined as those miRNA with a high predictive role) for each histotype, we imple-

| Copy number variation and expression integration
The copy number status was quantified using the segment mean value returned by the Absolute Copy number Estimator (ACE) software. 38 This value represents the expected number of copies for a given genome segment, and this value is assigned to all genes contained in the segment. For all miRNAs found at different loci, the final copy number status is obtained by calculating the sum of the copy number status of the loci.
To study the association between isomiR expression and their copy number status, we used a generalized linear model as implemented in the edgeR R package. We estimated a model for each miRNA where the raw miRNA counts were the independent variable and the copy number values across samples and the RUV factors for the normalizations were used as covariates. Likelihood ratio test was used to test the significance of the parameter associated with the SCNA variable, a significant estimate means a significant association between isomiR expression and its CNV. For this analysis the miRNA expression has been considered the ensemble of its isotypes.
IsomiRs with pvalue < 0.1 and with a positive coefficient were defined as significant and concordant: the higher the CNV the higher the expression.

| Transcription-target (TF)-miRNA interactions
To study the relationship between the expression of TFs and their tar-

| Cohort description
To achieve the aims of the study, a retrospective cohort of 215 snap-  Figure 1A shows that the majority of miRNA modifications (54%) accounted for the addition or deletion of nucleotides at the 3 0 -end (iso_3p) while 12% for the addition or deletion of nucleotides at the   Supplementary Figures S4 and S5).

| isomiRs as histotype specific biomarkers
Although DEMs have significant adjusted P-value, due to sample variability, only few of them can be exploited as subtype specific biomarkers, that is, miRNAs with high predictive potential.
Using a resampling-based strategy (see Section 2) we found 42 histotype-specific miRNAs, nine of which were specific to HGSOC, three to LGSOC, seven to EC and OCCC, and 16 to MOC subtypes ( Figure 2A and Supplementary Table S3).
Reference miR192/194 and miR30a-3p/5p were previously identified as specific biomarkers of MOC and OCCC subtypes in a largely overlapping cohort exploiting microarray data for expression profiling.
It is noteworthy that these miRNAs were confirmed as tissue specific biomarkers in our sequencing-based study, supporting confidence in the robustness of our analysis. However, NGS data analysis revealed that these four reference miRNAs are coexpressed with at least one additional variant. In particular, miR-30a-3p expressed an histotype specific iso_add (hsa-miR-30a-3p;iso_add:T) while the other miRNAs were characterized by histotype specific iso_3p modifications. The complete list of biomarkers with their different histological subtype is reported in Supplementary Table S4.
The high predictive potential of the 42 biomarkers is reflected by the clustering shown in Figure 2B. Except for EC samples that still   LGSOC markers Dots at the extremities of the layer highlight miRNA whose fraction of amplified/deleted samples is above the average value. The second and third layer represent the expression levels of the reference and of the iso_3p isomiRs, respectively. IsomiRs whose expression is significantly and concordantly associated to CNV status are highlighted with a brighter red color. In the fourth layer the ratio between reference and iso_3p expression is represented. In conclusion, our data indicate that biomarkers of low-and highgrade serous tumors are found expressed in healthy fallopian tubes suggesting that, although characterized by different evolutionary paths, as highlighted by the heterogeneous isomiR distribution between low and high grade, the tubal paradigm can explain the common origin of serous subtypes.

| Integration of miRNA expression and copy number alterations
We have recently demonstrated that Stage I EOC is molecularly characterized by three different genome instability patterns (namely stable, S, unstable, U and highly unstable, HU) lagging each histological subtype. 8 As genome structural rearrangements can be, at least in part, a plausible mechanism for miRNA expression alterations, we Briefly, we considered that if the over/underexpression of a miRNA is the result of a genomic event (ie, amplification/deletion) a significant and concordant association would be expected.
Circos plots reported in Figure 4A summarize the results of the integrative analysis obtained on the entire list of expressed miRNAs in HGSOC samples. Figure 4A and its zoom in Figure 4B show  Supplementary Table S5.
Taken together our results suggest that more than a third of the differential expression observed (and half of the biomarkers), is due to heterogeneity in terms of genome instability characterizing different subtypes.

| Transcriptional regulation of isomiRs
Differential TF activity is another molecular mechanism responsible for the altered expression of miRNAs. Since miRNA alterations can be partially explained by CNVs we explored whether these differences could be also ascribed to histotype-specific transcriptional regulation activity.
To this end we exploited the expression of a list of transcription factors (TFs) known to be regulators of miRNAs (TransmiR v2.0 database) available for a subset of our cohort (n = 65) 7 and used a multivariate regression model to infer TF-miRNA interactions (see Section 2 for details).
Our analysis identified 64 DEMs (17%) suggested to be transcriptionally regulated by 29 TFs. Figure 4C shows the list of the top 60% most significant TFs-miRNA relationships for each histotype (see Supplementary Table S6 for the complete list). HGSOC and OCCC are the two subtypes with the highest number of miRNAs with significant TF regulation. Interestingly, Figure 4D highlights Figure 4B), IL6 regulates miR-146b-5p and GLI1 is associated with miR-29a-5p (Supplementary Figure S6E). Interestingly, miR-454-3p, CREB1, ATF4 were already identified as a biomarker of platinum resistance in HGSOC. 45,46 Our results show that miRNA expression is the result of a complex interplay between different forces, among which CNV and TFs play a key role.

| DISCUSSION
It is now widely recognized that miRNA genes express multiple isoforms (isomiR) in a tissue-specific manner in physiological or pathological conditions. Some of the isoforms of a given miRNA may differ by a single nucleotide, so their identification could occasion- HGSOC is characterized by nine markers corresponding to five miRNAs (miR-146b, miR-30 e, miR-15b, miR-19a, miR-29a). Exploring the expression of these biomarkers in a collection of healthy fallopian tubes we observed that miR146b-5p and its isoforms are weakly expressed in healthy tissue, supporting their tumor-cell specificity. All these miRNAs have been found to be deregulated in many cancers. In particular, miR-146b downregulation inhibits cell migration and invasion and increased cell proliferation improving the response of EOC to chemotherapeutic agents. 53 In contrast, miR-19a negatively regulated the expression of PTEN and promoted the growth of ovarian cancer cells. 54 Finally, EC samples are characterized by a high expression of mir-9 and miR-499. Although its role in tumors appears to be controversial, aberrant expression of miR-9 is reported in different tumors, and it was shown to be a prometastasis onco-miR in breast cancer. 55 These isomiR markers tend to be highly tumor specific: MOC markers are highly expressed in COAD samples, OCCC markers in KIRC, EC biomarkers in UCEC.
LGSOC and HGSOC biomarkers were moderately expressed in late-stage ovarian cancer samples.
Looking at the isomiR spatial and temporal expression we observed that, at the relapse, tumor samples tend to maintain a patient specific fingerprint, while, at the diagnosis, tumor samples tend to be more similar within and across patients.
A final consideration regards the mechanism of miRNA deregulation in Stage I EOC. In the present study we have investigated two possible mechanisms, such as structural alterations and TF activity. By integrating SCNA profiles, TF expression and miRNAseq data we found that almost 38% of the expression profile modifications are the result of SCNAs, while 17% are the result of differential TF activity. This result suggests that the differential expression among and within histotypes is mostly due to aberrant genomic copy number status. Among the miRNA markers the expression of which is modified by their SCNA we found miR-9, for EC, miR-30a for OCCC, miR-574-5p for MOC and the six HGSOC markers (miR-15b-3p, miR-454-3p, miR-29a-5p, miR-30 e-5p, miR-19a-3p and miR27a-3p).
Taken together our results indicate a heterogeneous transcriptional picture within and among histotypes. The finding that different isomiRs are generated in different histological subtypes, suggests that the production of multiple highly similar isomiRs is an inevitable consequence of the tumor evolutionary process and that isomiRs can act cooperatively with miRNAs to control functionally related genes in the establishment of different tumor cells.