Intragenic MicroRNAs Autoregulate Their Host Genes in Both Direct and Indirect Ways—A Cross-Species Analysis

MicroRNAs (miRNAs) function as master switches for post-transcriptional gene expression. Their genes are either located in the extragenic space or within host genes, but these intragenic miRNA::host gene interactions are largely enigmatic. The aim of this study was to investigate the location and co-regulation of all to date available miRNA sequences and their host genes in an unbiased computational approach. The majority of miRNAs were located within intronic regions of protein-coding and non-coding genes. These intragenic miRNAs exhibited both increased target probability as well as higher target prediction scores as compared to a model of randomly permutated genes. This was associated with a higher number of miRNA recognition elements for the hosted miRNAs within their host genes. In addition, strong indirect autoregulation of host genes through modulation of functionally connected gene clusters by intragenic miRNAs was demonstrated. In addition to direct miRNA-to-host gene targeting, intragenic miRNAs also appeared to interact with functionally related genes, thus affecting their host gene function through an indirect autoregulatory mechanism. This strongly argues for the biological relevance of autoregulation not only for the host genes themselves but, more importantly, for the entire gene cluster interacting with the host gene.


Introduction
All higher eukaryotic genomes harbor protein-coding genes that are required for cellular function, but these resemble only about 1% of the entire genome, whereas 99% do not encode for proteins [1]. Within those non-coding regions, non-coding RNAs (ncRNAs) are embedded, which exert distinct biological roles. MicroRNAs (miRNAs) are currently the most intensely investigated members of the ncRNA family and are generally accepted as post-transcriptional regulators of gene expression. As the number of newly discovered miRNAs is constantly increasing, and novel ways of miRNA processing are emerging, our understanding of the importance and the regulation of miRNA expression is becoming increasingly important [2]. miRNA genes are present on all chromosomes, with the exception of the human Y chromosome. In previous studies, it has been reported that more than 50% of miRNA-coding genes are located within the extragenic space, thereby possessing their own regulatory elements. The mechanisms regulating the expression of these extragenic miRNAs remain largely elusive. The remaining miRNA genes reside within host genes, and these intragenic miRNAs and their host mRNAs may be processed from the same RNA substrate [3,4]. Individual intragenic miRNAs are often co-expressed with their host were neither intragenic, antisense nor overlapping were defined as extragenic (Figure 1). For the group of extragenic miRNAs, we, in addition, defined a group of miRNAs that were closely located to genes within 10 kilobases up-or downstream as near-gene miRNAs. Intragenic miRNAs were further classified into intronic, exonic, and mixed intronic-exonic (overlapping between introns and exons, or varying between different transcripts of a gene), depending on their location within genes. In addition to the four species investigated using the miRBase miRNA annotations, miRNA genomic locations from all 189 available species on Ensembl were retrieved and subsequently mapped onto the corresponding gene annotations.

Protein-Protein Interactions, Functional Enrichment and Pathway Analyses of Host Genes
Protein-protein interactions (PPI) for intragenic miRNA host genes were investigated using the STRING database (http://www.string-db.org) v11 [37], which includes direct and indirect protein associations collected from different databases. Interaction networks were prepared using medium confidence scores (0.40) and clustered using the Markov cluster algorithm (MCL; inflation parameter: 3). Disconnected nodes were hidden from the network for easier comprehension.
Functional enrichment and pathway analyses were performed using the g:GOSt gene group functional profiling tool from the g:Profiler web server (http://biit.cs.ut.ee/gprofiler), which is based on Ensembl release 96 and the genome assembly GRCh38 [38].
Classification systems tested were Gene Ontology (biological process, cellular component, and molecular function) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional annotation spaces, employing the ontology-focused multiple testing correction method g:SCS [39] with a significance threshold of p < 0.05. g:Profiler results were hierarchically clustered and filtered for the Best Per Parent pathway. Pathways for biological process (GO:BP), molecular function (GO:MF), and cellular components (GO:CC) were extracted and graphically illustrated.

Intragenic miRNA Target Prediction
To determine possible target relationships between intragenic, antisense, and overlapping miRNAs and host genes, the Diana Tools REST API (http://diana.imis.athenainnovation.gr/DianaTools) was accessed by applying custom-written Python (https://www.python.org) scripts. For miRNA target predictions, the microT-CDS v5 algorithm [40] was applied without setting a detection threshold, and miTG target prediction score and number of predicted binding sites were extracted for miRNA::host gene interactions. Genes that were not detected by microT-CDS were excluded from the analysis.

Protein-Protein Interactions, Functional Enrichment and Pathway Analyses of Host Genes
Protein-protein interactions (PPI) for intragenic miRNA host genes were investigated using the STRING database (http://www.string-db.org) v11 [37], which includes direct and indirect protein associations collected from different databases. Interaction networks were prepared using medium confidence scores (0.40) and clustered using the Markov cluster algorithm (MCL; inflation parameter: 3). Disconnected nodes were hidden from the network for easier comprehension.
Functional enrichment and pathway analyses were performed using the g:GOSt gene group functional profiling tool from the g:Profiler web server (http://biit.cs.ut.ee/gprofiler), which is based on Ensembl release 96 and the genome assembly GRCh38 [38].
Classification systems tested were Gene Ontology (biological process, cellular component, and molecular function) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional annotation spaces, employing the ontology-focused multiple testing correction method g:SCS [39] with a significance threshold of p < 0.05. g:Profiler results were hierarchically clustered and filtered for the Best Per Parent pathway. Pathways for biological process (GO:BP), molecular function (GO:MF), and cellular components (GO:CC) were extracted and graphically illustrated.

Intragenic miRNA Target Prediction
To determine possible target relationships between intragenic, antisense, and overlapping miRNAs and host genes, the Diana Tools REST API (http://diana.imis.athena-innovation.gr/DianaTools) was accessed by applying custom-written Python (https://www.python.org) scripts. For miRNA target predictions, the microT-CDS v5 algorithm [40] was applied without setting a detection threshold, and miTG target prediction score and number of predicted binding sites were extracted for miRNA::host gene interactions. Genes that were not detected by microT-CDS were excluded from the analysis.
As a significant number of host genes were predicted to be targeted by both mature variants (i.e., -5p and -3p) of a given miRNA, a modified version of the DIANA miTG target prediction score (mDPS) was calculated. This mDPS includes the separate prediction scores for both mature variants, where the prediction score of the miRNA variant of higher target probability (i.e., -Xp) is combined Cells 2020, 9, 232 4 of 22 with the squared product of the prediction score of higher probability with the prediction score of lower probability (i.e., -Yp): mDPS = prediction-score(-Xp) + (prediction-score(-Xp) × prediction-score(-Yp)) 2 (1) This allowed for a conservative incorporation of both separate prediction scores to reflect the increased probability of regulation of the host genes.
It was shown that enrichment analysis produced different results when applied to predicted targets or validated targets, and computational predictions may not overestimate the number and extent of meaningful miRNA-regulated processes [41,42]. To further increase the predictive validity of miRNA::host gene interactions, information about experimentally validated miRNA::gene targets was retrieved from the databases DIANA TarBase v8 [43], miRTarBase v7 [44] and starBase v2 [45]. Since the DIANA TarBase v8 also contains negative validations, only validations marked as positive were used for further analysis.

Iterative Randomized Model (IRM)
The specificity of miRNA::host gene pairs was assessed applying an iterative randomized model (IRM). Two separate IRM approaches were employed: IRM1 tested the list of intragenic, antisense, overlapping or near-gene miRNAs against a randomly sampled list of protein-coding genes from all protein-coding genes to elucidate specificity of miRNAs; IRM2 tested a randomly sampled list of non-intragenic miRNAs against the list of host genes for intragenic miRNAs to test for specificity of host gene targets. IRM sampling was performed over 100 iterations, and for each run, the mean mDPS and the mean target frequency were determined. Subsequently, Grubb's outlier test was applied to identify significant divergence from the randomized sampled distribution.

Essentiality of Host Genes
Host genes from intragenic miRNAs per species were evaluated for their critical implications in cell survival and viability. The Online Gene Essentiality Database (OGEE, http://ogee.medgenius.info) was used to assign the essentiality status of each miRNA host gene for the separate species [46]. Only genes explicitly annotated as significantly essential in the OGEE database were considered, and the remaining genes were annotated as non-essential genes.

Community Detection
Indirect autoregulation of host genes by intragenic miRNAs was assessed by performing community detection for all genes currently enlisted in the StringDB database and determining the community affiliation of the host genes for both HSA and MMU. The PPI confidence score was set to the highest confidence (≥0.9).
Community detection was performed using the best-partition algorithm of the Python package "Louvain Community Detection" with the resolution parameter set to default. A community was defined as harboring at least 5 genes and 5 intragenic miRNAs. The effect of the intragenic miRNAs within a community was assessed by the number of validated targets found in the mirTarbase, Tarbase and Starbase database and normalized to the number of genes ("gene ratio"), the number of miRNAs ("miRNA ratio") and the overall number of possible miRNA::gene interactions ("interaction ratio") in the community.
An interaction network of the influence of intragenic miRNAs between communities was constructed applying the MultiDiGraph method from the Python package NetworkX. In-and out-degrees for each community were determined based on the interaction ratio: Cells 2020, 9, 232 5 of 22 where v is the number of validated miRNA::gene interactions found in the community, c g the overall number of genes per community, and c m the overall number of intragenic miRNAs per community.

Data Analysis and Graphical Representation
For statistical analysis, GraphPad Prism v8 (Graphpad Software, San Diego, CA, USA) (Student's t-test, Grubb's outlier test, Kruskal-Wallis H-tests, Chi-squared test with Yates correction) and the Python v3.7 (Python Software Foundation, Beaverton, OR, USA) packages Matplotlib, Numpy, SciKit, Pandas, Requests, and Statsmodel were used as appropriate. The level of statistical significance was predefined at p < 0.05 for all statistical tests and corrected for multiple comparisons where appropriate.
For a graphical representation of the data, GraphPad Prism v8 and the Python 3.7 packages Seaborn, Matplotlib, Numpy, NetworkX, and Pandas were applied. Graphical post-processing was performed using CorelDraw X8 (Corel Corporation, Ottawa, ON, Canada).
Interestingly, the distribution of miRNA locations in zebrafish significantly differed from human, mouse, and fly with only~24% and~35% of miRNAs located in protein-coding or all genes, respectively. To further investigate the number of intragenic miRNAs across different species and test if there were prominent species differences, we performed the same analysis with the entire Ensembl catalog of 189 available species. The number of different miRNA groups per species was determined, and the species clustered according to the respective Ensembl "order" assignment (a list of species, miRNA groups and classification can be found in Supplementary Table S1). The absolute number of miRNAs between the orders of animals was significantly different, with the order "Primates" exhibiting the highest number of miRNAs ( Figure 3A; one-way ANOVA, F (11,177) = 16.29, p < 0.0001). In addition, the average percentage of miRNAs compared to the total number of genes showed significant differences between the orders of animals, with the order "Primates" showing the highest proportion of miRNAs in the genome and the order "Fish" showing the lowest ( Figure 3B; one-way ANOVA, F (11,160) = 12.63, p < 0.0001). When focusing on the average distribution of the different types of miRNAs, in different orders of animals varying proportions of intragenic, antisense, and overlapping miRNAs were detected ( Figure 3C; two-way ANOVA, species F (11,531) = 6.451, p < 0.0001; miRNA types F (2,531) = 246.3, p < 0.001; interaction F (22,531) = 3.104, p < 0.0001). Interestingly, the percentage of intragenic miRNAs from the four species investigated in detail was considerably higher than the average percentage of the respective order ( Figure 3C). These discrepancies between the species are likely due to the higher availability and quality of data sets derived from humans and animal models frequently used in biomedical research vs. the incompleteness of miRNA and gene annotations for other species [27,48].     Interestingly, the distribution of miRNA locations in zebrafish significantly differed from human, mouse, and fly with only ~24% and ~35% of miRNAs located in protein-coding or all genes, respectively. To further investigate the number of intragenic miRNAs across different species and test if there were prominent species differences, we performed the same analysis with the entire Ensembl catalog of 189 available species. The number of different miRNA groups per species was determined, and the species clustered according to the respective Ensembl "order" assignment (a list of species, miRNA groups and classification can be found in Supplementary Table S1). The absolute number of miRNAs between the orders of animals was significantly different, with the order "Primates" exhibiting the highest number of miRNAs ( Figure 3A; one-way ANOVA, F(11,177) = 16.29, p < 0.0001). In addition, the average percentage of miRNAs compared to the total number of genes showed significant differences between the orders of animals, with the order "Primates" showing the highest proportion of miRNAs in the genome and the order "Fish" showing the lowest ( Figure 3B; one-way ANOVA, F(11,160) = 12.63, p < 0.0001). When focusing on the average distribution of the different types of miRNAs, in different orders of animals varying proportions of intragenic, antisense, and overlapping miRNAs were detected ( Figure 3C; two-way ANOVA, species F(11,531) = 6.451, p < 0.0001; miRNA types F(2,531) = 246.3, p < 0.001; interaction F(22,531) = 3.104, p < 0.0001). Interestingly, the percentage of intragenic miRNAs from the four species investigated in detail was considerably higher than the average percentage of the respective order ( Figure 3C). These discrepancies between the species are likely due to the higher availability and quality of data sets derived from humans and animal models frequently used in biomedical research vs. the incompleteness of miRNA and gene annotations for other species [27,48]. All 189 currently annotated species in the Ensembl catalog were subjected to miRNA location analysis and grouped according to the Ensembl species categories. Entire miRNA occurrence for all species categories in absolute numbers (A), as well as normalized to the total number of all-genes (B) compared between the groups. Percentage of intragenic, antisense, and overlapping miRNAs was determined for each species group and compared between the groups (C). Interspecies group comparisons were performed using a one-way ANOVA with a p-value threshold set to <0.05.

miRNA Host Genes Show Functional Clusters and Pathway Enrichment
To test whether protein-coding genes, hosting intragenic miRNAs, were functionally related, the lists of host genes per species were subjected to protein-protein interaction (PPI) and pathway enrichment analyses. We first performed a PPI analysis using the STRING database, followed by Markov chain network clustering. This revealed a significant PPI enrichment for all four species (see Supplementary Table S2 for the network statistics per species). Cluster analysis revealed several functionally connected clusters of host genes with at least five nodes for HSA (35 clusters; Supplementary Figure S2), MMU (19 clusters, Supplementary Figure S4) and DME (2 clusters; Supplementary Figure S6), whereas no functional clusters of the respective complexity were found for DRE. The detected clusters were further analyzed for pathway enrichment, which revealed specific functions of the individual host gene clusters (see Supplementary Figure S3 for HSA, Supplementary Figure S5 for MMU, Supplementary Figure S7 for DME). Of those, 19 GO and 11 Figure 3. All 189 currently annotated species in the Ensembl catalog were subjected to miRNA location analysis and grouped according to the Ensembl species categories. Entire miRNA occurrence for all species categories in absolute numbers (A), as well as normalized to the total number of all-genes (B) compared between the groups. Percentage of intragenic, antisense, and overlapping miRNAs was determined for each species group and compared between the groups (C). Interspecies group comparisons were performed using a one-way ANOVA with a p-value threshold set to <0.05.

miRNA Host Genes Show Functional Clusters and Pathway Enrichment
To test whether protein-coding genes, hosting intragenic miRNAs, were functionally related, the lists of host genes per species were subjected to protein-protein interaction (PPI) and pathway enrichment analyses. We first performed a PPI analysis using the STRING database, followed by Markov chain network clustering. This revealed a significant PPI enrichment for all four species (see Supplementary Table S2 for the network statistics per species). Cluster analysis revealed several functionally connected clusters of host genes with at least five nodes for HSA ( The conservation of specific pathways is further supported by the observation that 143 of the protein-coding genes were found in both mouse and human in a direct literal comparison of gene names, indicating that a large proportion of important host genes appear to be conserved between HSA and MMU. Assignment of the homologous genes to the respective clusters revealed that HSA cluster 2 (Supplementary Figure S2) and MMU cluster 1 (Supplementary Figure S4) shared four genes (KLHL3, RNF130, HUWE1, and CCNF), whereas HSA cluster 4 and MMU cluster 6 shared three genes (COPZ1, COPZ2, and R3HDM1). Furthermore, the pathway analysis for the respective clusters showed that both HSA cluster 2 and MMU cluster 1 showed functional enrichment of ubiquitin-related pathways (e.g., GO:0000209 protein polyubiquitination, GO:0061630 ubiquitin-protein ligase activity), whereas both HSA cluster 4 and MMU cluster 6 showed functional enrichment of Golgi-related pathways (e.g., GO:0006890 retrograde vesicle-mediated transport, Golgi to endoplasmic reticulum).

Intragenic miRNAs Target Their Host Genes
Considering the small proportion of protein-coding DNA regions in the genome, intragenic miRNA location may represent an evolutionary advantage. Thus, we hypothesized that intragenic miRNAs could function as autoregulatory switches, either on a host gene or functional level.
To elucidate a potential miRNA::host gene autoregulatory mechanism, we applied miRNA target predictions for the respective host genes using the microT-CDS algorithm. As both mature miRNA strands (i.e., -5p and -3p) could possibly target their host gene, we calculated a modified target prediction score (mDPS), in which the target probability for both strands was included in a weighted manner (see Methods section). As a second measure indicative of miRNA::host gene targeting, we calculated the target probability of all miRNAs with their respective host genes as average per species ( Figure 5A-C). The respective average mDPS values and target probabilities are provided in Table 3. To test whether miRNA::host gene target probabilities and mDPS values were specific or occurred by chance due to overly permissive target predictions, we generated iterative randomized models (IRMs) for each species, by pairing the lists of each type of miRNAs with Figure 4. Percentage of essential intragenic, antisense, and overlapping miRNA host genes is shown for HSA, MMU, and DME. Frequency of essential intragenic, antisense, and overlapping host genes was compared intraspecies and interspecies using a Chi-square test with a p-value threshold set to <0.05 (*), 0.001 (***).

Intragenic miRNAs Target Their Host Genes
Considering the small proportion of protein-coding DNA regions in the genome, intragenic miRNA location may represent an evolutionary advantage. Thus, we hypothesized that intragenic miRNAs could function as autoregulatory switches, either on a host gene or functional level.
To elucidate a potential miRNA::host gene autoregulatory mechanism, we applied miRNA target predictions for the respective host genes using the microT-CDS algorithm. As both mature miRNA strands (i.e., -5p and -3p) could possibly target their host gene, we calculated a modified target prediction score (mDPS), in which the target probability for both strands was included in a weighted manner (see Methods section). As a second measure indicative of miRNA::host gene targeting, we calculated the target probability of all miRNAs with their respective host genes as average per species ( Figure 5A-C). The respective average mDPS values and target probabilities are provided in Table 3. To test whether miRNA::host gene target probabilities and mDPS values were specific or occurred by chance due to overly permissive target predictions, we generated iterative randomized models (IRMs) for each species, by pairing the lists of each type of miRNAs with randomly assigned protein-coding genes (IRM1; 100 iterations). Outlier testing revealed that both the target probability, as well as the average mDPS of intragenic miRNAs for HSA, MMU, and DME, was significantly different from the IRM1 models ( Figure 5A; Grubb's test, p < 0.05), suggesting an evolutionary conserved autoregulatory pathway of miRNA::host gene regulation. Surprisingly, the antisense miRNA target probability and average mDPS for MMU and DME, but not HSA, was also significantly different from the respective IRM1 models (Figure 5B), suggesting a regulatory role not only of the miRNAs located on the sense strand but also of miRNAs located on the antisense strand of a gene. In contrast, overlapping miRNAs did not show specific targeting of their partially hosting genes ( Figure 5C). Cells 2020, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/cells suggesting an evolutionary conserved autoregulatory pathway of miRNA::host gene regulation. Surprisingly, the antisense miRNA target probability and average mDPS for MMU and DME, but not HSA, was also significantly different from the respective IRM1 models (Figure 5B), suggesting a regulatory role not only of the miRNAs located on the sense strand but also of miRNAs located on the antisense strand of a gene. In contrast, overlapping miRNAs did not show specific targeting of their partially hosting genes ( Figure 5C). . mDPS and target frequency was determined for the native groups (red, green, and blue dots) and compared to an iterative randomized model (IRM1), were each intragenic miRNA was randomly sampled against host genes 100 times (grey dots). For each iteration, the mean mDPS and the mean target frequency was determined, and Grubb's outlier test was used to identify statistically significant allocation of the native group to the randomized model (A-C). The threshold for statistical significance was set to an α error of 0.05. . mDPS and target frequency was determined for the native groups (red, green, and blue dots) and compared to an iterative randomized model (IRM1), were each intragenic miRNA was randomly sampled against host genes 100 times (grey dots). For each iteration, the mean mDPS and the mean target frequency was determined, and Grubb's outlier test was used to identify statistically significant allocation of the native group to the randomized model (A-C). The threshold for statistical significance was set to an α error of 0.05. A possible explanation for the increased target probability and mDPS values for intragenic miRNAs and their host genes could be an increase in the number of miRNA recognition elements (MREs) on the host gene mRNA specific for the respective intragenic miRNAs. We, therefore, evaluated the number of host gene MREs for the respective intragenic miRNAs, as well as from the randomly sampled IRM1 models. The average number of MREs for intragenic miRNAs on their host genes was significantly higher than on random genes for HSA, MMU, and DME ( Figure 6A; average number of MREs, HSA-2.05, MMU-2.18, DME-1.30). To ensure that the analysis of MREs was not confounded by variance differences between the individual iterations and the native intragenic miRNA::host gene values, we determined the standard deviation (SD) for the iterations and the native values and found no significant difference for any of the species ( Figure 6B). A possible explanation for the increased target probability and mDPS values for intragenic miRNAs and their host genes could be an increase in the number of miRNA recognition elements (MREs) on the host gene mRNA specific for the respective intragenic miRNAs. We, therefore, evaluated the number of host gene MREs for the respective intragenic miRNAs, as well as from the randomly sampled IRM1 models. The average number of MREs for intragenic miRNAs on their host genes was significantly higher than on random genes for HSA, MMU, and DME ( Figure 6A; average number of MREs, HSA-2.05, MMU-2.18, DME-1.30). To ensure that the analysis of MREs was not confounded by variance differences between the individual iterations and the native intragenic miRNA::host gene values, we determined the standard deviation (SD) for the iterations and the native values and found no significant difference for any of the species ( Figure 6B). Although both the intragenic miRNA::host gene target probability and mDPS values, as well as the number of MREs specific for the respective intragenic miRNA was increased, a strong limitation of our model of an autoregulatory role of intragenic miRNAs would occur if the respective host genes were generally targeted by more miRNAs, independent of their type. Thus, to determine the specificity of host genes for their respective intragenic miRNAs, we generated an additional IRM that paired the host gene with randomly assigned non-intragenic miRNAs (IRM2; 100 iterations). The average target probability and mDPS values for individual iterations were determined and compared to IRM1 as well as the intragenic miRNA::host gene pairs. For both HSA Although both the intragenic miRNA::host gene target probability and mDPS values, as well as the number of MREs specific for the respective intragenic miRNA was increased, a strong limitation of our model of an autoregulatory role of intragenic miRNAs would occur if the respective host genes were generally targeted by more miRNAs, independent of their type. Thus, to determine the specificity of host genes for their respective intragenic miRNAs, we generated an additional IRM that paired the host gene with randomly assigned non-intragenic miRNAs (IRM2; 100 iterations). The average target probability and mDPS values for individual iterations were determined and compared to IRM1 as well as the intragenic miRNA::host gene pairs. For both HSA and MMU, but not DME, the intragenic miRNA::host gene pair was significantly different from the IRM2 models ( Figure 7A-C), suggesting high-level specificity of host genes for their intragenic miRNAs. Interestingly, there seems to be a separation between IRM1 and IRM2 for DME, which is slightly less obvious in MMU, and absent in HSA ( Figure 7A-C). These results thus are consistent with an evolutionary pressure on bidirectional intragenic miRNA::host gene specificity.
Finally, experimentally validated intragenic miRNA::host gene interactions for HSA and MMU were retrieved from TarBase, miRTarBase, and starBase and compared to the host gene target predictions. This revealed that a respectable number of miRNA::host gene interactions were already experimentally validated (107 validations for HSA and 44 validations for MMU), which further strengthens the concept of a specific autoregulatory interaction between miRNAs and their host genes.
Cells 2020, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/cells and MMU, but not DME, the intragenic miRNA::host gene pair was significantly different from the IRM2 models ( Figure 7A-C), suggesting high-level specificity of host genes for their intragenic miRNAs. Interestingly, there seems to be a separation between IRM1 and IRM2 for DME, which is slightly less obvious in MMU, and absent in HSA ( Figure 7A-C). These results thus are consistent with an evolutionary pressure on bidirectional intragenic miRNA::host gene specificity. Finally, experimentally validated intragenic miRNA::host gene interactions for HSA and MMU were retrieved from TarBase, miRTarBase, and starBase and compared to the host gene target predictions. This revealed that a respectable number of miRNA::host gene interactions were already experimentally validated (107 validations for HSA and 44 validations for MMU), which further strengthens the concept of a specific autoregulatory interaction between miRNAs and their host genes.

Relation between Proximity to Gene and Target Probability
We could so far show that intragenic miRNAs show an increased probability for targeting their own host genes, higher target prediction scores, and that host genes harbor more binding sites specific for the hosted miRNA. To evaluate whether this evolutionarily conserved interaction of miRNAs with their host genes is based on the placement of a miRNA within a gene, or if the simple distance of a miRNA to a gene is a determinant for autoregulation, we extended our analysis to miRNAs located in the vicinity of genes, specifically up to 10 kilobases up-and downstream of a gene (i.e., near-gene miRNAs; see Figure 1). These near-gene miRNAs underwent the same analysis as intragenic, antisense, and overlapping miRNAs. Near-gene miRNAs exhibited a significantly lower target probability for genes in their direct vicinity as compared to intragenic miRNAs for their host genes in HSA, MMU, and DME ( Figure 8). Using IRM modeling, we found that for neargene miRNAs, the probability of targeting genes in their vicinity or randomly selected genes was similar across all three species ( Figure 8A; Grubb's test p > 0.05). In addition, there was no significant correlation between the miRNA::gene distance and the mDPS suggesting no direct relationship between the miRNA::gene distance and the probability of mRNA targeting in HSA, MMU, and DME ( Figure 8B). These data support the hypothesis that intragenic miRNAs possess the specific capability to directly autoregulate their host genes. Figure 7. Host gene specificity was assessed by a second IRM, where each host gene was randomly sampled against a non-intragenic miRNA for 100 iterations. For each iteration, the mDPS and the mean target frequency were determined (sky blue dots) and compared to the native groups (red dots) and the IRM1 (grey dot cloud) for HSA (A), MMU (B) and DME (C). Grubb's outlier test was used to assess the statistical allocation of the native group to the IRM2 + IRM1. The threshold of statistical significance was set to <0.05.

Relation between Proximity to Gene and Target Probability
We could so far show that intragenic miRNAs show an increased probability for targeting their own host genes, higher target prediction scores, and that host genes harbor more binding sites specific for the hosted miRNA. To evaluate whether this evolutionarily conserved interaction of miRNAs with their host genes is based on the placement of a miRNA within a gene, or if the simple distance of a miRNA to a gene is a determinant for autoregulation, we extended our analysis to miRNAs located in the vicinity of genes, specifically up to 10 kilobases up-and downstream of a gene (i.e., near-gene miRNAs; see Figure 1). These near-gene miRNAs underwent the same analysis as intragenic, antisense, and overlapping miRNAs. Near-gene miRNAs exhibited a significantly lower target probability for genes in their direct vicinity as compared to intragenic miRNAs for their host genes in HSA, MMU, and DME ( Figure 8). Using IRM modeling, we found that for near-gene miRNAs, the probability of targeting genes in their vicinity or randomly selected genes was similar across all three species ( Figure 8A; Grubb's test p > 0.05). In addition, there was no significant correlation between the miRNA::gene distance and the mDPS suggesting no direct relationship between the miRNA::gene distance and the probability of mRNA targeting in HSA, MMU, and DME ( Figure 8B). These data support the hypothesis that intragenic miRNAs possess the specific capability to directly autoregulate their host genes.

Indirect Host Gene Regulation by Modulation of Functional Pathways
In general, miRNAs target multiple different mRNAs, and host gene autoregulation might play an important but from a functional perspective minor role. Therefore, we investigated whether intragenic miRNAs are able to target other host genes within the networks, which were functionally linked to their host genes. Target gene analysis was performed for the entire set of intragenic miRNAs against all genes hosting intragenic miRNAs. This revealed a set of highly interactive intragenic miRNAs for both HSA and MMU that were predicted to target more than 75% of all host genes and, therefore, might play an important functional role in pathway regulation (Table 4). When comparing predicted miRNA::host gene interactions to the combined list of experimentally validated interactions, we found that in total 44.90% interactions for HSA (36,  determined and compared to an IRM, where each near-gene miRNA was randomly sampled against a protein-coding gene for 100 iterations (grey dot cloud). Grubb's outlier test was used, and no significant allocation of the native group was identified. (B) Linear relationship between the mDPS and the distance of the miRNA to its closest gene was assessed, and the Pearson correlation coefficient, as well as the p-value, was determined for HSA, MMU, and DME.

Indirect Host Gene Regulation by Modulation of Functional Pathways
In general, miRNAs target multiple different mRNAs, and host gene autoregulation might play an important but from a functional perspective minor role. Therefore, we investigated whether intragenic miRNAs are able to target other host genes within the networks, which were functionally linked to their host genes. Target gene analysis was performed for the entire set of intragenic miRNAs against all genes hosting intragenic miRNAs. This revealed a set of highly interactive intragenic miRNAs for both HSA and MMU that were predicted to target more than 75% of all host genes and, therefore, might play an important functional role in pathway regulation (Table 4). When comparing predicted miRNA::host gene interactions to the combined list of experimentally validated interactions, we found that in total 44.90% interactions for HSA (36,040 from 80,276 predicted, plus 8309 non-predicted) and 20.29% for MMU (17,024 of 83,887 predicted, plus 4362 non-predicted) have already been validated previously, however, not all intragenic miRNAs are represented in the above-used databases (HSA, 882 of 992 intragenic miRNAs; MMU, 278 of 705).   To investigate a possible indirect regulation of functional pathways, host gene networks from the previous PPI analysis were extended by intragenic miRNAs hosted by the respective cluster genes and targeting either their own host or other host genes within the same network. These connections were additionally weighted by the individual target prediction scores. Degree centrality for both intragenic miRNA network nodes (miRNA degree), as well as host gene nodes (gene degree), was calculated by summing the number of incoming network edges per node, averaged and normalized per cluster based on the number of miRNA nodes (for miRNA degree) or the number of total nodes (miRNA nodes and gene nodes for gene degree). Plotting the normalized miRNA degree as a function of the normalized gene degree, followed by linear regression analysis, resulted in a separation into clusters with high miRNA regulatory strength and clusters with low miRNA regulatory strength (Figure 9). The clusters with the highest proposed regulatory strength (HSA: Cluster 2; MMU: Cluster 1; see Supplementary  Figures S3 and S5) were both associated with ubiquitin-related pathways, suggesting that the function of the genes related to ubiquitin-related clusters is highly regulated by miRNAs co-expressed with the respective genes and this regulation appears to be conserved at least between mammals. To investigate a possible indirect regulation of functional pathways, host gene networks from the previous PPI analysis were extended by intragenic miRNAs hosted by the respective cluster genes and targeting either their own host or other host genes within the same network. These connections were additionally weighted by the individual target prediction scores. Degree centrality for both intragenic miRNA network nodes (miRNA degree), as well as host gene nodes (gene degree), was calculated by summing the number of incoming network edges per node, averaged and normalized per cluster based on the number of miRNA nodes (for miRNA degree) or the number of total nodes (miRNA nodes and gene nodes for gene degree). Plotting the normalized miRNA degree as a function of the normalized gene degree, followed by linear regression analysis, resulted in a separation into clusters with high miRNA regulatory strength and clusters with low miRNA regulatory strength (Figure 9). The clusters with the highest proposed regulatory strength (HSA: Cluster 2; MMU: Cluster 1; see Supplementary Figures S3 and S5) were both associated with ubiquitin-related pathways, suggesting that the function of the genes related to ubiquitin-related clusters is highly regulated by miRNAs co-expressed with the respective genes and this regulation appears to be conserved at least between mammals. Figure 9. Indirect autoregulation of intragenic miRNAs for each intragenic host gene cluster in HSA (A) and MMU (B) was assessed. miTG prediction score for all possible intragenic miRNA::host gene interactions was determined and a combined network of host gene::host gene interaction and intragenic miRNA::host gene interaction was composed. Network analysis was performed, determining the degree centrality for each miRNA normalized to the total number of miRNA nodes and for each gene normalized to the total number of nodes in the network. A linear relationship between the mean normalized miRNA degree and the mean normalized gene degree for each cluster was assessed, and the summed target prediction score normalized to the total number of miRNA::gene interactions in each cluster is depicted as the dot-size for each cluster.
To further test this indirect regulatory mechanism not only on the level of host genes but on the entire interactome, we performed community detection on the complete StringDB database for Figure 9. Indirect autoregulation of intragenic miRNAs for each intragenic host gene cluster in HSA (A) and MMU (B) was assessed. miTG prediction score for all possible intragenic miRNA::host gene interactions was determined and a combined network of host gene::host gene interaction and intragenic miRNA::host gene interaction was composed. Network analysis was performed, determining the degree centrality for each miRNA normalized to the total number of miRNA nodes and for each gene normalized to the total number of nodes in the network. A linear relationship between the mean normalized miRNA degree and the mean normalized gene degree for each cluster was assessed, and the summed target prediction score normalized to the total number of miRNA::gene interactions in each cluster is depicted as the dot-size for each cluster.
To further test this indirect regulatory mechanism not only on the level of host genes but on the entire interactome, we performed community detection on the complete StringDB database for human and mouse and selected the ones containing at least five genes and five intragenic miRNAs. This approach identified 26 communities for HSA, and 28 communities for MMU, which contained host genes. Subsequently, each gene per community was tested for experimentally validated intragenic miRNA targeting that was harbored within the same community. The number of validations was then normalized to the total number of genes found in the community, the total number of miRNAs in the community, as well as the overall number of all possible interactions in the community (Supplementary  Tables S4 and S5).
In total, 24 of 26 (HSA) and 26 of 28 (MMU) communities were influenced by intragenic miRNAs hosted within the same community (interaction ratio > 0). Influenced communities were ranked based on the interaction ratio, to identify communities with a high indirect autoregulatory potential by intragenic miRNAs. This revealed 12 (HSA) and 10 communities (MMU) with an interaction ratio ≥ 1. Functionality of these communities was further assessed by pathway enrichment analysis, showing that for HSA biological processes, such as the ribosomal large subunit assembly (community 11), regulation of alternative mRNA splicing (community 8), Rab protein signal transduction (community 4), positive regulation of I-kappaB kinase/NF-kappaB signaling (community 7) and ubiquitin-related pathways (community 12), for MMU biological processes, such as regulation of translation (community 66), positive regulation of cAMP-mediated signaling (community 1) and ubiquitin-related pathways (community 5), were significantly enriched (Supplementary Tables S6 and  S7). Additionally, comparing the enrichments for all communities between HSA and MMU revealed that 72.81% of the enrichments were identical, suggesting conservation of indirect autoregulated pathways.
Since miRNAs are in general predicted to target a large number of mRNAs, cross-community regulation by miRNAs was assessed by investigating the number of experimentally validated intragenic miRNA::gene interactions between the different communities. Network analysis was performed to determine the in-and out-degree for each community based on the overall number of incoming and outgoing validated miRNA::gene interactions, as well as the weighted strength (interaction ratio). This revealed communities whose intragenic miRNAs showed high regulatory potency without being strongly regulated by other communities (Figure 10A Tables S4 and S5). In total, 24 of 26 (HSA) and 26 of 28 (MMU) communities were influenced by intragenic miRNAs hosted within the same community (interaction ratio > 0). Influenced communities were ranked based on the interaction ratio, to identify communities with a high indirect autoregulatory potential by intragenic miRNAs. This revealed 12 (HSA) and 10 communities (MMU) with an interaction ratio ≥ 1. Functionality of these communities was further assessed by pathway enrichment analysis, showing that for HSA biological processes, such as the ribosomal large subunit assembly (community 11), regulation of alternative mRNA splicing (community 8), Rab protein signal transduction (community 4), positive regulation of I-kappaB kinase/NF-kappaB signaling (community 7) and ubiquitin-related pathways (community 12), for MMU biological processes, such as regulation of translation (community 66), positive regulation of cAMP-mediated signaling (community 1) and ubiquitin-related pathways (community 5), were significantly enriched (Supplementary Tables S6 and S7). Additionally, comparing the enrichments for all communities between HSA and MMU revealed that 72.81% of the enrichments were identical, suggesting conservation of indirect autoregulated pathways.
Since miRNAs are in general predicted to target a large number of mRNAs, cross-community regulation by miRNAs was assessed by investigating the number of experimentally validated intragenic miRNA::gene interactions between the different communities. Network analysis was performed to determine the in-and out-degree for each community based on the overall number of incoming and outgoing validated miRNA::gene interactions, as well as the weighted strength (interaction ratio). This revealed communities whose intragenic miRNAs showed high regulatory potency without being strongly regulated by other communities ( Figure 10A

Discussion
This study provides new knowledge on the localization of miRNAs in relation to protein-coding and non-coding genes, specifically to their host genes, and highlights miRNA::host gene interactions as a specific and evolutionarily conserved autoregulatory mechanism. The results support the concept that a given intragenic miRNA not only interacts with its host gene but also has a high probability of influencing other members of the host gene's interactome, which places miRNAs in the role of a master switch, capable of regulating their host genes entire protein interaction cluster.

Distribution of Intragenic miRNAs Across Different Species
The percentage of intragenic and intronic miRNAs based on the location within protein-coding genes was slightly higher than in previous reports, where an increased percentage of known miRNA genes located within annotated protein-coding regions from 42% to 57% was observed with an increasing number of datasets available and progressing miRBase releases [4,47]. Within the population of human intragenic miRNAs, however, the percentage of intronic, exonic, and mixed miRNAs was the same as described in a previous study, although at that time, only 221 human miRNAs were known [2]. Thus, the extended data set, including all currently known miRNAs across four species, further supports the concept that many intragenic miRNAs and their host genes resemble single transcription units. However, biogenesis of intragenic RNAs might well occur independently from their host gene [49]. Not all types of intragenic miRNAs may have the same co-expression bias. The probability of co-expression of intragenic miRNA with their host genes is strongly related to their evolutionary conservation degree, and evolutionarily conserved intragenic miRNAs have a much higher rate of co-expression with host genes than poorly conserved ones [26]. In line with the concept that intragenic miRNAs provide an evolutionary advantage, the values were quite similar for human and mouse which have the most complete sets of annotated coding genes and miRNAs, however, diverging for other species, such as drosophila melanogaster or zebrafish. The differences, however, may also be at least partially due to the fact that for less frequently studied species, lower numbers of stringent and complete data sets are available.

miRNA Host Genes Show Functional Clusters and Pathway Enrichment
Up till now, miRNAs targeting specific gene clusters are largely exploited from a biomarker angle, i.e., as predictors for disease or responsiveness to certain therapies. In contrast, miRNA host genes as components of enriched pathways and the mutual interactions of proteins encoded by mRNAs harboring miRNAs have so far received only a little attention from a global point of view. Significant network communities are enriched with target genes of miRNA clusters, as suggested for a network community of ten proteins, which is co-regulated by four miRNA families in the mir-379 cluster [50].
Both host genes and intragenic miRNAs are subject to different regulatory mechanisms, and tight regulatory control of these genes is critical for specific molecular pathways and cellular functions. Therefore, potential feedback loops between intragenic miRNAs and their host genes, when targeted by the hosted miRNA, may be critically important. To this end, we performed genome-wide analyses and found strong interaction patterns observed between the targeting of host genes by intragenic miRNAs, but not extragenic or overlapping miRNAs. In a previous study, 20% of intragenic miRNAs were predicted to target their host mRNA transcript, and 22 of the 74 pathways yield overrepresentation of proteins encoded by mRNA targets of associated intragenic miRNAs [47]. In our in silico paradigm, intragenic miRNAs showed increased probability for targeting their host genes and higher target prediction scores. Together, this provides novel unbiased evidence for autoregulatory miRNA::host gene interactions as a general principle that fine-tunes host gene expression. Experimentally validated intragenic miRNA::host gene interactions for HSA and MMU were retrieved and revealed that 107 of HSA and 44 of MMU miRNA::host gene interactions are already experimentally validated. For miRNAs regulating their host genes interactomes, even 44.90% of interactions for HSA and 20.29% for MMU are validated. This further strengthens the concept of a specific autoregulatory interaction between miRNAs and their host genes with significant functional relevance.
Our current analysis supports a regulatory role not only of the miRNAs located on the sense strand but also of miRNAs located on the antisense strand of a gene. In contrast, overlapping miRNAs did not show specific targeting of their partially hosting genes. As a possible explanation for the increased target probability and mDPS values for intragenic miRNAs and their host genes, we found an increase in the number of miRNA recognition elements (MREs) on the host gene mRNA specific for the respective intragenic miRNAs. While intragenic and overlapping miRNAs showed a clearly significant probability to target their host genes, near-gene miRNAs exhibited a significantly lower target probability for genes in their direct vicinity as compared to intragenic miRNAs for their host genes. Therefore, the proximity of a miRNA to a gene seems not to be sufficient to warrant a high probability for a given miRNA to target a near gene.
Interestingly, not only miRNAs are located in genomic regions but also other non-coding RNAs (e.g., snoRNAs, lncRNAs, pseudogenes, circRNAs, etc.) are embedded in host genes, which are adopting the same strategy as intragenic miRNAs and have been reported to interfere with host gene expression [67][68][69][70].
A particularly relevant subgroup of intragenic miRNA are alternatively spliced miRNAs located at exon-intron junctions due to their specific inhibitory effects on host gene expression [71][72][73][74][75]. This respective "mixed" miRNA subpopulation suitable for alternative splicing was in the same range in a previous study reporting a similarly lower subpopulation of these miRNAs in human compared to mouse [76]. Furthermore, it has been shown that transcription of intronic miRNAs, as well as splicing of the host genes, can bidirectionally regulate each other's transcription by a counteracting mechanism, where transcription of the miRNA reduces the splicing of the host gene and vice versa [77,78].

Indirect Host Gene Regulation by Modulation of Functional Pathways
While previous studies mainly focused on conserved miRNA expression from an evolutionary angle, we here emphasize the central functional role of conserved miRNAs for the regulation of their host genes together with other genes within the host gene's interactome by providing evidence that intragenic miRNAs can target other genes within the networks functionally linked to their host genes. Intragenic miRNAs are particularly frequent in genes encoding proteins involved in pathways that are relevant for highly specialized organs, such as the frontal cortex [24]. In general, it is suggested that target genes regulated by an individual miRNA interact through a network of functionally associated genes [79,80], and miRNA interactome analysis has been introduced as a tool to identify miRNA functional relevance [81,82]. Using the network analysis toolkit ANAT [83], Melamed, Levy (76) generated a network connecting the most abundant AGO2-mRNA targets, in which six of ten candidate targets are part of a potential biological network. Enrichment analysis of the reconstructed network showed that both potential miR-412 targets, as well as their associated intermediate node genes, were overrepresented in programmed cell death processes [76]. Furthermore, miRNA::target gene interaction sites contain both interactor and structural elements in various combinations. Not considering spatially defined structural elements, such as precursor miRNAs and the interactor mRNAs or long ncRNAs, our approach based on sequence complementarity may not consider all relevant types of miRNA interactions with their host genes as relevant targets. This may explain why performance of bioinformatics studies based on nucleotide sequences cannot be sufficiently precise on its own. Additional parameters, such as three-dimensional structures and binding energy of specific miRNA motifs, need to be taken into account as they become available for a full understanding of miRNA::host gene autoregulation [84]. However, and in line with studies reporting that individual miRNAs target disease-relevant molecular pathways [60,64,66] our unbiased genome-wide study provides important evidence that network-wide targeting is not the result of random associations but more likely resembles a generic mechanism regulating entire molecular pathways [85].

Conclusions
We demonstrate that intragenic miRNAs show a strong probability of interacting with their own host genes. This did not only involve direct miRNA/host gene targeting but also interaction with functionally related genes via an indirect autoregulatory mechanism. This strongly argues for the biological relevance of a potential autoregulation not only for the host genes itself but, more importantly, for the entire cluster of genes coding for proteins it interacts with.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4409/9/1/232/s1, Figure S1: Flow chart of data acquisition and processing. Validation databases used: starBase, miRTarBase, TarBase (yellow hexagons), Figure S2: StringDB protein-protein Markov chain cluster analysis of HSA intragenic miRNA host genes. 35 cluster were defined with a number of connected nodes >5, Figure S3: All HSA network-cluster were subject to a g:Profiler enrichment analysis for GO:Biological Process (A), GO:Cellulare Components (B), GO:Molecular Function (C) and KEGG (D), Figure S4: StringDB protein-protein Markov chain cluster analysis of MMU intragenic miRNA host genes, Figure S5: All MMU network-cluster were subject to a g:Profiler enrichment analysis for GO:Biological Process (A), GO:Cellulare Components (B), GO:Molecular Function (C) and KEGG (D), Figure S6: StringDB protein-protein Markov-chain cluster analysis on DME intragenic miRNA host genes, Figure S7: All DME network-cluster were subject to a g:Profiler enrichment analysis for GO:Biological Process (A) and GO:Cellulare Components (B), Table S1: List of species miRNA groups and classifications from Ensembl genome catalog, Tables S2: Network statistics for protein-protein interaction enrichments, Tables S3. Essential vs. non-essential genes for the different miRNA types and species, Tables S4: Indirect autoregulation of host gene related protein-protein networks by HSA intragenic miRNAs, Tables S5: Indirect autoregulation of host gene related protein-protein networks by MMU intragenic miRNAs, Tables S6: Enrichment Table for each separate community in HSA, ranked by the p-value, Tables S7: Enrichment Table for each separate community in MMU, ranked by the p-value, Table S8: In-and out-degree for each community in the HSA network, Table S9. Inand out-degree for each community in the MMU network.
Author Contributions: K.K.K. and M.Z. designed the study. K.K.K. and M.Z. wrote the scripts and performed the data analysis. K.K.K., M.Z., and M.K. interpreted the data and wrote the manuscript. A.H. critically reviewed the contents of the paper and suggested substantial improvements. All authors have read and agreed to the published version of the manuscript.

Funding:
The study was supported by the Austrian Science Fund (FWF) grants P30809 to K.K.K. and DK-SPIN W1206-06, P25345, and P28611 to M.K.