Exploring regulatory networks of miR-96 in the developing inner ear

Mutations in the microRNA Mir96 cause deafness in mice and humans. In the diminuendo mouse, which carries a single base pair change in the seed region of miR-96, the sensory hair cells crucial for hearing fail to develop fully and retain immature characteristics, suggesting that miR-96 is important for coordinating hair cell maturation. Our previous transcriptional analyses show that many genes are misregulated in the diminuendo inner ear and we report here further misregulated genes. We have chosen three complementary approaches to explore potential networks controlled by miR-96 using these transcriptional data. Firstly, we used regulatory interactions manually curated from the literature to construct a regulatory network incorporating our transcriptional data. Secondly, we built a protein-protein interaction network using the InnateDB database. Thirdly, gene set enrichment analysis was used to identify gene sets in which the misregulated genes are enriched. We have identified several candidates for mediating some of the expression changes caused by the diminuendo mutation, including Fos, Myc, Trp53 and Nr3c1, and confirmed our prediction that Fos is downregulated in diminuendo homozygotes. Understanding the pathways regulated by miR-96 could lead to potential therapeutic targets for treating hearing loss due to perturbation of any component of the network.

Testing microarray and network genes. We had previously confirmed the misregulation of 15 genes from the P4 microarray by qRTPCR 1 ), but prior to building the network, 10 more genes were tested and the misregulation was confirmed in 7 of them ( Fig. 2a; Bhlhe40, Rasd2, Odf3b, Mom5, Dynlbr2, Meig1, Myo3a). We also tested the potential intermediate regulators Nr3c1, Foxo1, Foxo3 and Mitf, which are all confirmed targets of miR-96 in different tissues [13][14][15][16] ), Pou4f3, a known deafness gene which regulates Gfi1 17 , and Thrb which has been shown to regulate Slc26a5 in the outer hair cells 18 . Zic2 was identified previously as a target by miRanda prediction 1 , and preliminary networks created using Ingenuity IPA suggested Htt and Tnc as potential intermediate regulators.
Of those nine potential intermediates, only expression of Zic2 and Pou4f3 were significantly affected when tested by qRTPCR (Fig. 2b, adjusted P < 0.05).
The misregulation of genes by mutant miR-96 can only be relevant if those genes are co-expressed with miR-96 in the hair cells, so we tested 7 of the misregulated genes by immunohistochemistry and confirmed that Anxa4, Meig1, Homer1, Kcna10, Kalrn, Gtf2e2, and Tppp were present in wildtype hair cells ( Supplementary Fig. S2). Nr3c1, Foxo3, Thrb and Pou4f3 are known to be expressed in the hair cells [19][20][21][22] , and Foxo1 is expressed in the otic vesicle early in development 23 . The remaining 4 genes, Mitf, Htt, Zic2 and Tnc, were tested by immunohistochemistry and all found to be expressed in wildtype hair cells ( Supplementary Fig. S2). No difference in staining pattern or intensity was observed in Dmdo/Dmdo homozygote mutant hair cells.
Identifying direct targets of miR-96 using an Ago2 pulldown assay. Identifying the targets of miR-96 is critical for exploring the networks it controls. We looked for direct targets of miR-96 by immunoprecipitating Ago2 (one of the RISC complex proteins to which microRNAs bind and direct to their targets) and extracting the RNA bound to it 24 using RNA from the olfactory bulbs and organs of Corti of six Dmdo/Dmdo homozygotes and six sex-matched wildtype littermates at P4, and analysed the RNA by microarray. We used the olfactory bulb as well as the organ of Corti because miR-96 is known to be expressed there 4 and it offers more material than the organ of Corti. There were no genes reported as significantly misregulated in mutants (adjusted P < 0.05) after p-value adjustment for multiple testing. However, 216 genes from the organ of Corti and 23 from the olfactory bulb were downregulated in mutant samples with an unadjusted P < 0.01, suggesting these were more strongly bound by the wildtype miR-96 seed region. We tested these genes against three microRNA target predictors: miRDB (http:// mirdb.org/cgi-bin/search.cgi) 25 , starBase (http://starbase.sysu.edu.cn/index.php) 26 and Diana-microT (http:// diana.cslab.ece.ntua.gr/microT/) 27 . We also used our target list compiled previously using miRanda 1 and did a text search for the presence of complementary matches to the miR-96 seed region in the 3′ UTR (a less stringent prediction method) (Supplementary Table S2). From the resulting lists, 12 targets were selected based on the strength of their fluorescence in the microarray and on their bioinformatics predictions: Ablim1, Unkl, Cacna1c, Akap7, Tmem97, Gnb4, Mpv17l, Clvs1, 6820408C15Rik, Sox5, Ankrd27 and Zfp251. We checked for the presence of the 12 transcripts in wildtype organ of Corti cDNA by RTPCR and qRTPCR. 6820408C15Rik was not detected, Unkl, Cacna1c and Akap7 qRTPCR probes did not work reliably, suggesting a very low transcript level, and none of the remaining 8 genes were significantly upregulated in diminuendo mutant organ of Corti (Fig. 3a, adjusted P < 0.05), as would be expected of a target of miR-96. We hoped to identify multiple targets through the Ago2 pulldown, but it may be that the small amount of material limited the results.
Compiling the misregulated gene list. We compiled a list of genes and associated expression values, consisting of all the misregulated genes from the P4 and P0 microarrays with adjusted P < 0.1, and those genes with expression changes detected by qRTPCR with adjusted P < 0.05 (Supplementary Table S4). The less stringent P value cutoff was chosen in order to maximise the genes available for network construction. We had several indicators that genes with adjusted P < 0.1 still offer biological relevance in this case. For example, Dynlrb2 has an adjusted P = 0.0522 and was shown to be significantly upregulated by qRTPCR (Fig. 2a, adjusted P = 0.0004). Sdc2 has adjusted P = 0.089 in the P0 microarray, but at P4 it has adjusted P = 0.009 and by qRTPCR, also at P4, it was shown to be upregulated with adjusted P = 0.026 1 . Pnpla8 has adjusted P = 0.075 in the P4 microarray and adjusted P = 0.036 in the P0 microarray. In the case of the p0 microarray, there were 18 genes with adjusted P < 0.05, and raising the p-value cutoff to 0.1 meant including just 6 more. For the p4 microarray, raising the threshold to p < 0.1 meant 46 genes were added to the initial 85 with p < 0.05. If the microarray gave a significant result but qRTPCR testing did not, the microarray expression value was used where quantitative data was required. We also added genes identified by qRTPCR in our previous studies 1, 6 . This misregulated gene list includes a few genes predicted to be direct targets, so there is some overlap between this list (Supplementary Table  S4) and the list of direct targets (Supplementary Table S3).

Construction of a regulatory network.
To create this network, the miR-96 direct targets (Supplementary Table S3; 31 genes) were treated as one separate group and the rest of the misregulated gene list, excluding the predicted targets, were a second separate group (Supplementary Table S4; 188 genes). The two groups were first connected to each other using Ingenuity IPA. The software has the option to connect input genes using its own manually curated database; for the regulatory network we report here, only this database was used to connect the input genes, and only regulatory interactions were permitted. Not every misregulated gene could be accounted for within the network (most notably Ptprq, for which no upstream regulators have been described), but when no further genes could be connected, every link was checked against the original published article and incorrect links were removed (for example, many papers studying Htt carry out experiments using the mutant poly-CAG form, which is not relevant for this network). Each pathway within the network was checked for consistency with respect to the observed gene misregulation, and the intermediate genes along that pathway were annotated with a predicted change of expression/activity (Fig. 4a). Where intermediate genes could be up-or downregulated, and either way could explain a number of observed gene expression changes, a weighting algorithm was used to determine which prediction explained more changes, first taking into account known misregulation, then predicted misregulation, of genes both up-and downstream (Fig. 4b). The resulting network is internally consistent and offers multiple hypotheses for further testing (Fig. 5).
If these genes really do mediate the effects of miR-96, then they offer candidate therapeutic targets for modulating the network as a whole and ultimately the expression of many genes in the hair cells. Trp53, for example, regulates Ppp1r15a, Bhlhe40 and Gfi1 directly, and indirectly regulates Slc26a5, Ocm and Anxa4 via Myc and Fos (Fig. 4a). Trp53 is predicted to be downregulated in diminuendo homozygotes, either by a reduction in mRNA or protein levels, or by decreased activity, and either Sdc2 or Mitf might mediate that downregulation. If Trp53 activity can be somehow increased or Sdc2 activity knocked down in diminuendo homozygotes, it might rescue the misregulation of Ppp1r15a, Bhlhe40, Gfi1, Slc26a5, Ocm and Anxa4.
However, before any conclusions can be drawn, the involvement of these genes with multiple connections (hub genes) has to be checked, for example by examining expression in the hair cells and testing for misregulation of mRNA or protein in diminuendo homozygotes.
From the entire network, we chose interesting nodes (genes) based on the number of their downstream connections, their position within the network, and their potential influence on downstream nodes, particularly Ocm and Slc26a5. The genes those nodes represented were tested by qRTPCR, and Fos was found to be significantly downregulated in diminuendo homozygotes (adjusted P < 0.05), fitting the prediction. None of the other seven tested were significantly affected ( Fig. 3b; Ets1, Tgfb1, Sp1, Myc, Agt, Rhoa, Trp53), but this doesn't rule them out as genuine interactors in the miR-96 regulatory network, because they may have altered protein levels or activity, neither of which would be detected by qRTPCR. Myc, Sp1, Ets1 and Fos were also found to be present in wildtype hair cells by immunohistochemistry ( Supplementary Fig. S2); no obvious difference in expression was observed in diminuendo homozygote hair cells.
Protein-protein interactome analysis. The regulatory network uses only regulatory interactions, and does not take advantage of the extensive protein-protein interaction (PPI) data publicly available. A different approach to network construction is to build a PPI network using the publicly available results from many experiments. This has the advantage of being less constrained by bias towards well-studied "favourite genes" such as Cdc2 29 but it does not result in a directional network, or in predictions of up-or downregulation that can be tested. We used InnateDB, a public PPI database combining data from multiple sources, to carry out over-representation and network analyses. In the first analysis we focussed on transcription factor and pathway over-representation, using the misregulated gene list (Supplementary Table S4) plus Fos, but no transcription factors or pathways were over-represented after adjustment for multiple testing (adjusted P < 0.05, Supplementary  Table S5). We then carried out a network analysis on both the misregulated gene list (Supplementary Table S4 plus Fos) and the miR-96 direct target list (Supplementary Table S3) together, to obtain a PPI network which was analysed in Cytoscape ( Supplementary Fig. S3), using the Network Analyser tool to calculate node degree, betweenness centrality and closeness centrality (Supplementary Figs S3, S4). The node degree is the number of neighbours a node has; nodes with a high degree are hub nodes. Betweenness centrality is a measure of how important a node is for connecting distant parts of the network, and closeness centrality measures how important a node is within its immediate vicinity.
Eighteen nodes had a high degree ( > 75, Table 1), but only Ywhae also scored highly on both centrality measures (Table 1, Supplementary Fig. S4). Of the high-degree nodes, five are predicted targets of miR-96 (Foxo3, Foxo1, Rad51, Irs1 and Nr3c1), and five are known to be misregulated in diminuendo homozygotes (Gfi1, Nphp1, Parp1, Fos and Fadd). The others are Ywhae, Ywhah, Ywhaz, Pgls, Hsd3b7, IDB-74, Ubc and Cenpc1. Ywhae, Ywhah and Ywhaz are members of the 14-3-3 protein family, which are ubiquitous adapter proteins, and Ubc is part of the ubiquitination system, so they have many known interaction partners. Pgls is a hydrolase involved in the pentose phosphate pathway 30 , Hsd3b7 is an enzyme required for bile acid synthesis 31 , and Cenpc1 is a component of the kinetochore plate 32 . Their high degree in the PPI network may reflect biological relevance, or it may be, like the 14-3-3 proteins, because they have a lot of interaction partners and among those partners are genes misregulated in the diminuendo homozygote organ of Corti. IDB-74 is an InnateDB identifier representing a large protein complex, which is why it has a high degree.
Gene Set Enrichment Analysis. Both the regulatory network and the PPI network were constructed knowing the context of the transcriptome data, that it came from a mouse carrying a mutation in miR-96. For both those networks, we chose to add the predicted miR-96 targets with the aim of connecting as many of the misregulated genes to miR-96 as possible. Gene set enrichment analysis, however, takes as input only the misregulated genes, and looks for collections of genes which occur more often than would be expected by chance. These gene sets are predefined by the user, and for this analysis we used the following gene sets available at MsigDB (http://  Pale green or pink means the node's misregulation has been reported in the microarray, and intense green or red means the node's misregulation has been shown by qRTPCR. miR-96 is shown in yellow. Arrows indicate positive regulation (activation, increased transcription), and bars show negative regulation (repression, reduction in activity); solid lines indicate direct regulation (for example, direct binding of protein to promoter) and dotted lines indirect regulation (where other genes may mediate the observed effect of one gene upon another). www.broadinstitute.org/gsea/msigdb/index.jsp): the "Hallmark" gene sets which summarise biological states or processes, gene sets from Gene Ontology, KEGG and Reactome, and transcription factor and microRNA target genes (accessed July 30 th , 2015). We compared the most upregulated to the most downregulated genes, and also the most affected to the least affected genes, regardless of the direction of regulation. The input data consisted of the genes with a fold change greater than 10% from the P4 and P0 microarray data and the qRTPCR with adjusted P < 0.05. Where multiple probes were present for the same gene, the fold change with the lowest P-value was included, resulting in 2653 genes (Supplementary Table S6). Significantly enriched gene sets (false discovery rate (FDR) < 0.05) are shown in Table 2, as are all the gene sets with nominal P < 0.05.
One of the regulatory factors identified as having targets enriched in the upregulated genes is miR-96, which is a good positive control. Myc is the other factor with a significantly enriched target set among the upregulated genes, and is of particular interest, being downstream of Gfi1 and upstream of Fos, Slc26a5 and Ocm in the regulatory network (Fig. 4a). Among the factors with a nominal P < 0.05 are Egr4, Pou1f1, Tcf3, Cebpb and Vdr, which have been previously linked to inner ear development and/or deafness, as has Myc itself [33][34][35][36][37][38] . These genes are implicated in mediating the pattern of misregulation detected in diminuendo homozygotes and are, therefore, also candidates for further testing, and potential therapeutic targets. Multiple gene sets representing pathways and processes, which can also be targets for manipulation by drugs or small molecules, were identified as enriched in the input data, including the p53 pathway and synaptic transmission, which fits with our previous reports of synaptic defects in diminuendo homozygotes 6 .

Discussion
miR-96 is a master regulator of hair cell differentiation, and controls many genes. Here we have used several different tools to identify genes which play important roles in mediating the action of miR-96.
Our manually created regulatory network demonstrates a novel approach to network creation, and the resulting network has several important features. Firstly, some of the genes are independently known to be involved in hearing, such as Fgf8, Slc26a5 (prestin), Pou4f3 and Gfi1 9,10,39,40 , suggesting that other genes in the network may also underlie deafness. Secondly, several of the genes are known to be involved in cellular stress response  Table 2. Gene Set Enrichment Analysis results showing gene sets with a nominal P < 0.05 from the microarray and qRTPCR data, divided into genes regulated by regulatory factors, genes associated with cellular locations, genes belonging to defined pathways and genes linked to processes. *gene sets with FDR < 0.05.
pathways, such as Trp53, Hif1a and Nfe2l2 (reviewed in Simmons et al. 41 ), suggesting that the network may be perturbed during stress triggered by environmental factors. Thirdly, a number of the network molecules are already known to be targeted by drugs, like Trp53, Nfkb1 and Nr3c1, providing routes to manipulation of the network as a whole. While most of the misregulated genes detected by microarray and qRTPCR are present in the protein-protein interaction network created using InnateDB, approximately 31% are not, including Prestin, Chrna10 and Pou4f3. This may be because there are important targets of miR-96 that we have not discovered, and so are missing from the initial input, or it may be that the targets are correct but there is insufficient data from previous publications or databases on the interactions between the input molecules. Of the nodes identified as important by network analysis in Cytoscape, Parp1, Gfi1, Nphp1, Fos, Fadd, Nr3c1, Irs1, Rad51, Foxo1 and Foxo3 are present in the regulatory network, and Parp1, Gfi1, Nphp1, Fos and Fadd have been shown by microarray or qRTPCR to be misregulated in diminuendo homozygotes.
The gene set enrichment analysis implicates further regulators and regulatory pathways in the diminuendo phenotype, some of which are present in the regulatory network but most of which are not, suggesting that there are more pathways to be discovered. miR-96 targets (defined by MSigDB as genes sharing the motif GTGCCAA, which matches the wildtype seed region, in their 3′ UTRs) are enriched in the upregulated genes, as would be expected. This correlates well with the Sylamer results (Fig. 1). Of the transcription factors whose targets are enriched in the misregulated genes, Egr4, Pou1f1, Tcf3, Myc, Cebpb and Vdr have been previously implicated in inner ear development and/or deafness [33][34][35][36][37][38] . None of the 31 direct targets of miR-96 (Supplementary Table S3) were implicated by GSEA, which could be because gene sets controlled by the direct targets were not available at the time of analysis, or it could mean that none of the 31 direct targets control the cascade of misregulated genes seen in diminuendo homozygote inner ears. Of the genes implicated by GSEA, only Atf2 is predicted by StarBase (http://starbase.sysu.edu.cn/index.php) 26 to be a target of miR-96.
We have tried three methods of network analysis, all of which identified genes for further testing, but none of which offered a universal solution. Indeed, there is no common gene identified by all three methods as a mediator of the effects of miR-96 (Fig. 6). This reflects the different strengths, biases and weaknesses of the methods used to create the networks. The regulatory network is very dependent on what is reported in the literature, and emphasises connecting miR-96 to the misregulated genes observed in the diminuendo mutant, while the PPI network relies on large databases of protein-protein interactions, which do not offer directional, regulatory links between interacting partners. Gene Set Enrichment Analysis can be useful because it does not rely on correctly identifying the targets of miR-96, many of which remain unknown, but it does need comprehensive gene sets, which may not exist for all the genes mediating the effects of miR-96 in the inner ear. More data may help in the construction of more accurate and comprehensive networks. Transcriptome data from microarrays and RNA-seq, which is available from public databases such as ArrayExpress, may prove useful because they combine directional regulatory interactions with a genome-wide scope, hopefully avoiding at least some of the bias of well-studied genes. The networks are also all subject to a common limitation in that they cannot identify novel interactions, relying as they do on previous studies in other tissues or organisms. The lack of inner ear-specific data is also a potential problem; interactions observed in one type of tissue may not apply to the organ of Corti.
Our analyses implicate Myc, Gfi1 and Fos as important candidates; Fos regulates both Slc26a5 and Ocm 42,43 , two of the most downregulated genes in the P4 microarray 1 , and when we tested it we confirmed its predicted downregulation in diminuendo homozygotes. In the regulatory network Fos is controlled by Gfi1 via Myc 44,45 , and Gfi1 is known to be downregulated in diminuendo homozygotes. Myc also regulates Anxa4, which is upregulated in diminuendo homozygotes and has recently been found to be specifically expressed in hair cells, along with Rasd2, another gene upregulated in diminuendo homozygote organ of Corti 46 . Both Gfi1 and Fos were also identified as important genes in the protein-protein interaction network, and Myc's targets were detected by GSEA to be significantly enriched in the upregulated genes. All three have been implicated in hearing and deafness, Gfi1 in particular; the hair cells of mice homozygous for a Gfi1 knockout allele degenerate around P0 10 . Mice homozygous for a null allele of Fos show a reduced startle response at higher frequencies 47 , and Myc has been implicated in protection against noise exposure-induced damage; guinea-pigs inoculated with adenovirus carrying c-myc showed a smaller threshold shift and less cochlear damage after noise exposure than controls inoculated with vector 35 . The other candidate targets are those genes identified by multiple networks (Fig. 6). Further testing in vitro would determine whether small molecule agonists or antagonists can modify the expression of these genes, and whether that has an effect further down the line on genes such as Ocm, Ptprq and Slc26a5, whose misregulation in diminuendo homozygotes appears to result in specific aspects of the observed phenotype 6,11 . miR-96 controls many genes, and it is likely that many of the connections between the microRNA and the misregulated genes still remain to be discovered. However, the potential benefits of finding genes and pathways which can be modified by small molecules or drugs are significant. Altering the expression of important genes and pathways may prevent or delay hearing loss caused by mutations in miR-96, or indeed any other trigger which affects the miR-96 network, whether genetic or environmental. Discovering ways to counteract the effect of dysfunctional miR-96 could ultimately lead to therapeutic treatments to delay or prevent progressive hearing loss with a range of aetiologies in the human population.

Materials and Methods
Ethics statement. Mouse studies were carried out in accordance with UK Home Office regulations and the UK Animals (Scientific Procedures) Act of 1986 (ASPA) under UK Home Office licences, and the study was approved by both the Wellcome Trust Sanger Institute and the King's College London Ethical Review Committees. Mice were culled using methods approved under these licences to minimize any possibility of suffering.
Ago2 pulldown. The organs of Corti and olfactory bulbs of mice at P4 were dissected and immediately frozen in liquid nitrogen. Samples were treated as described previously 24 , with the following modifications; samples were homogenized in 300 ul ice-cold buffer, 30 ul of beads were used per sample, prebound to 8.8 ug anti-mouse monoclonal Ago2 antibody (Wako Pure: 018-22021) according to the Dynabead protocol (Invitrogen: 100.07D immunoprecipitation kit). Samples were incubated with antibody for 90 minutes. After immunoprecipitation, 300 ul TRIzol and 60 ul chloroform was added to extract RNA.
qRTPCR. cDNA was made from normalised organ of Corti RNA using Superscript II Reverse Transcriptase (Invitrogen, cat. no. 11904-018) after treatment with DNAse 1 (Sigma, cat.no: AMP-D1). Quantitative RT-PCR was carried out using probes and reagents from Applied Biosystems (Master Mix: 4364340) and Bio-Rad (SsoFast and SsoAdvanced Master mixes, cat. nos 1725232, 1725281), and the 2 −ΔΔCt equation was used to calculate relative expression levels 53 . Hprt was used as an internal control, and the quantity of sensory tissue present was checked using Jag1 or Ngfr, which are expressed in supporting cells 1,[54][55][56] . Pairs were only used when Jag1/Ngfr levels did not differ more than 30% between wildtype and Dmdo/Dmdo homozygote littermates. At least three technical replicates of each sample were carried out for each reaction, and each primer/probe set was tested in at least five wildtype-homozygote littermate pairs, with the exception of Chrna1 and Clvs1, which due to problems with the probes had only three biological replicates each. Primer details are in Supplementary table S7; standard primers were designed by primer3 57 and qRT PCR primers were purchased from Applied Biosystems (Life Technologies). The Wilcoxon rank sum test was chosen to determine significance as a suitable test for small sample sizes and populations of unknown characteristics 58 . P-values were adjusted using the method described in Benjamini and Hochberg 52 to take multiple testing into account.
Network construction and analysis software. Networks were generated with the aid of IPA (Ingenuity ® Systems, www.ingenuity.com). InnateDB 59 was used for interactome construction and pathway over-representation analysis with Cytoscape 60 for visualisation and further analysis, and the Gene Set Enrichment Analysis (GSEA) software 61 for enrichment analysis. InnateDB and GSEA both provide statistical analyses which are corrected for multiple testing.