Identification and characterization of microRNAs expressed in the African malaria vector Anopheles funestus life stages using high throughput sequencing

Over the past several years, thousands of microRNAs (miRNAs) have been identified in the genomes of various insects through cloning and sequencing or even by computational prediction. However, the number of miRNAs identified in anopheline species is low and little is known about their role. The mosquito Anopheles funestus is one of the dominant malaria vectors in Africa, which infects and kills millions of people every year. Therefore, small RNA molecules isolated from the four life stages (eggs, larvae, pupae and unfed adult females) of An. funestus were sequenced using next generation sequencing technology. High throughput sequencing of four replicates in combination with computational analysis identified 107 mature miRNA sequences expressed in the An. funestus mosquito. These include 20 novel miRNAs without sequence identity in any organism and eight miRNAs not previously reported in the Anopheles genus but are known in non-anopheles mosquitoes. Finally, the changes in the expression of miRNAs during the mosquito development were determined and the analysis showed that many miRNAs have stage-specific expression, and are co-transcribed and co-regulated during development. This study presents the first direct experimental evidence of miRNAs in An. funestus and the first profiling study of miRNA associated with the maturation in this mosquito. Overall, the results indicate that miRNAs play important roles during the growth and development. Silencing such molecules in a specific life stage could decrease the vector population and therefore interrupt malaria transmission.

Although there are over 30 species of Anopheles which transmit malaria in the world [51], miRNAs have so far only been experimentally identified in three malaria vectors, the African vector (Anopheles gambiae) and in two Asian vectors (Anopheles stephensi and Anopheles anthropophagus) [7,52,53]. Among the African vectors, Anopheles funestus is one of the most proficient malaria vectors, mainly because of its remarkable ability to populate a wide range of ecological settings across Africa [54][55][56][57][58]. Therefore, experimental identification of miRNAs controlling key genes required for mosquitoes to complete their life-cycle will not only help to better understand the vector biology, but it may uncover novel approaches to control this mosquito. In this context, the main objective of this study was to identify miRNAs expressed in the four main life stages (eggs, larvae, pupae and unfed adult females) of the mosquito An. funestus using high throughput sequencing technology and bioinformatics approaches.

Mosquito strain and rearing condition
The experimental work was performed using a colony of An. funestus (FUMOZ) that originates from southern Mozambique [59]. The mosquitoes were reared in the insectary of the Vector Control Reference Laborator at the National Institute for Communicable Diseases (NICD), Johannesburg, South Africa since 2000. The insectary is kept at 25 °C, 80% relative humidity with a 12-h day/night lighting regime and 45-min dusk/dawn cycles.

RNA extraction and small RNA sequencing
Total RNA was isolated separately from the four different life cycle stages of An. funestus (eggs, larvae, pupae and unfed adult females) using TRIzol reagent (Invitrogen, USA) according to the manufacturer's protocol. Four different biological replicates for each life stage for RNA extraction were prepared. In order to obtain a large and broad miRNA transcriptome data set, RNA was extracted from 5 egg patches, 100 fourth instar larvae, 100 pupae, and 100 unfed adult females for each single batch. The quality and quantity of resulting RNA was assessed using a spectrophotometer (Nanodrop Technologies), and quality assessment determined by using Agilent 2100 Bioanalyzer RNA Nano 6000 kit. No rRNA depletion was performed. The RNA extracts from the four life stages were sent to Macrogen Inc (South Korea) for small RNA sequencing. Sequencing libraries were generated from 1 μg each of the total RNA samples by ligation of adapter RNAs at 5′ and 3′ ends, followed by reverse transcription and PCR using Illumina TruSeq Small RNA Sample Preparation protocol (Illumina Inc., USA). The libraries were size-selected for sequencing of RNA fragments of 18-30 nucleotides. Sequencing was performed on an Illumina HiSeq 2000 platform to obtain single-end reads of 50 bases. Each batch sequenced independently.

Sequencing data processing and analysis Read quality check and filtering
Following sequencing, the quality of the four sequenced libraries were checked using FastQC [60]. Later, all sequencing reads with low quality tags and shorter than 18 nucleotides were removed using FASTX-Toolkit [61]. For all following analysis, all reads from each stage were combined into a single input.

Mapping the reads to the reference genome
All reads were mapped to the An. funestus strain FUMOZ genome [GenBank: KB668221]. The sequence reads were mapped to the genome using miRDeep2 mapper module [62].

Small non-coding RNAs detection
All reads were aligned to the RNA families database (Rfam) version 11 [63] using the BLASTn algorithm allowing for a 2 nucleotide mismatch and e-value lower than 0.01 in order to annotate the small non-coding RNAs present in the libraries.

Identification of miRNA sequences
For identification of miRNA sequences present in the four life stage datasets, the miRDeep2 package was employed [62]. The package scripts detect known or novel miRNAs from deep sequencing data. It looks for the pattern that the miRNA processing machinery leaves in the sequencing data. The most important pattern that miRDeep2 considers are clusters of reads that align along the reference genome that is compatible with the mature miRNA sequence, the loop sequence, and the star sequence structure of the miRNA precursor molecule. If such a pattern is found, miRDeep2 cuts out the potential miRNA precursor sequence from the reference genome and utilizes an RNA folding algorithm (randfold) from the Vienna package [64] to assess if the sequence can be folded into a hairpin structure. Furthermore, the prediction software searches for potential cleavage sites of Drosha and Dicer. The phylogenetic conservation and filtering of other known small non-coding RNA species can be optionally used to improve the predictions. All identified miRNAs are named according to their most similar miRNAs in the miRNA database (miRBase) release 21 (June 2014) [18][19][20][21] match.

Differential expression of the miRNAs
Differentially expressed miRNAs between two sequential life stages (egg-larva, larva-pupa, pupa-unfed adult female) were determined by log 2 fold change >2 for the normalized reads +1. This analysis was computed using gtools package for the R programming language.

Sequence quality of the four libraries
Small RNAs were sequenced in quadruplicate using Illumina sequencing from eggs, larvae, pupae and unfed adult females batches to identify An. funestus miRNAs expressed during development. More than 150 million raw reads were produced from each life stage (Table 1). The length distribution of the reads was between 18 and 30 nucleotides (data not shown). After filtering the impurities and reads of length smaller than 18 nucleotides, 75.92, 71.52, 80.85 and 75.39%, high-quality reads had Phred quality values (PQV) of 20 obtained from the eggs, larvae, pupae, and unfed adult females libraries, respectively ( Table 1). The PQV has previously been reported to be an indicator of base call accuracy and therefore sequence quality [65].

Mapping reads from the four libraries
Mapping reads over the unmasked genome represents an unbiased option, allowing the detection of known and still undiscovered miRNAs. The total number of the reads mapped to An. funestus genome constitutes only 62.61, 63.90, 73.85 and 59.42% of the total high-quality reads from eggs, larvae, pupae, and unfed adult females libraries, respectively ( Table 1).

Annotation of small non-coding RNAs in the four libraries
In order to annotate other small non-coding RNAs in the four libraries, all clean reads were aligned against Rfam version 11. As shown in Fig. 1, the most abundant class of small non-coding RNAs in the eggs library were miR-NAs. However, rRNAs were most abundant in the larvae and pupae, and tRNAs in the unfed adult females library.

Identification of miRNAs in the four life stages libraries of An. funestus
Of the 65 previously known An. gambiae mature miR-NAs in miRBase release 21 (June 2014), 64 mature miR-NAs were detected in the sequenced short reads of An. funestus ( Table 2). The two miR-276 precursors in An. gambiae generate two different mature miRNAs whereas in An. funestus they produce the same mature miRNA. These include all the 46 miRNA sequences computationally predicted in the An. funestus strain FUMOZ genome hosted by the invertebrate vectors of human pathogens database (VectorBase) [66]. The full precursor structure (mature, loop and star sequence) or parts of it were detected. Only star sequences were found for miR-12, miR-306, miR-965, miR-989, miR-1174 and miR-1889 and the precursor locations for these miRNAs were not determined. Reads that do not match any known An. gambiae miRNA sequences from miRBase and mapped to the An. funestus genome were screened for precursor or hairpin structures. The miRDeep2 algorithm was employed to investigate if non-annotated sequences mapping to the An. funestus genome demonstrated folding properties of precursors or hairpins. A total of 43 mature miRNA candidates expressed from 42 precursor candidates were identified (Table 3). Among these miR-NAs, 15 miRNAs have been described in the Anopheles genus (An. gambiae and/or An. anthropophagus) but not registered in miRBase database, eight have not been previously reported in the Anopheles genus but are known in other mosquitoes such as Aedes aegypti and/ or Culex quinquefasciatus, and the remaining 20 miR-NAs have not been described before in any species. For all the mature miRNA candidates, the full or parts of the precursor were detected in the four life stage libraries except for miR-2943 that was detected only in the eggs library. MiRNAs frequently exhibit sequence differences when compared to a reference mature sequence, generating multiple variations that are known as isoforms. New isoforms were identified for miR-927 (miR-927-3p) and miR-980 (miR-980-5p). The new isoforms of miR-980  (59,42) were expressed only in the eggs library. Our analysis uncovered a third stem-loop precursor for miR-2.

Expression profiling of miRNAs during An. funestus development
To obtain insight into possible stage dependent roles of miRNAs in An. funestus, the expression patterns of miR-NAs in different life stages were examined based on the number of reads obtained. The heatmaps (Fig. 2) summarize the expression of the miRNAs identified in the four life stage libraries. The majority of miRNAs were sequenced between 1-6 6 times. The expression profiles of miRNAs varied from highly specific to ubiquitous during the four stages. Among the known miRNAs, miR-263a was the most frequently expressed miRNA in the eggs, the larvae, and the pupae libraries (>1 million reads), whereas miR-8 was the dominant miRNA in the unfed adult females library (>300 thousand reads    Six miRNAs were up-regulated and another 13 downregulated between the larva and pupa stage. However, seven miRNAs were up-regulated and four down-regulated between the pupa and the unfed adult female.

Discussion
Although thousands of small RNAs have been identified [18][19][20][21], the challenge remains to fully identify all small RNAs, especially very low abundance miRNAs and to determine their individual functions. The majority of known miRNAs have been identified through the traditional cloning method [67]. The Illumina sequencing approach is one of high throughput technologies by which miRNAs can be detected in any organism without prior sequence or secondary structure information sequence or secondary structure information. Here, 107 miRNA sequences expressed in the four life stages of the African malaria vector An. funestus were identify and characterize using the deep sequencing approach. This method is more powerful than other conventional technologies previously used in mosquitoes [7,11,52], as it The three columns for each life stage correspond to detection of the full precursor structure namely mature, loop and star sequence, respectively. √ √ √ indicates that miRNA full precursor structure was detected in the library where × × × indicates the the miRNA was not detected in the library     [1]. Using similar deep sequencing techniques, other studies on insects show small RNA sequence size distributions with a peak at 22 nucleotides [11,34,68,69]. All the high-quality reads were mapped on to the An. funestus genome with low percentage (average of 65%) identity. Such a mapping bias has been reported in several studies [21,[70][71][72]. This approach has the weakness that it might favour alignment ambiguities due to the limited alignment specificity given by the small length of mature miRNAs (18-25 nucleotides) detected by the short reads, and to the size and high complexity of an unmasked reference genome [70].
As expected, most of the known miRNAs identified in An. funestus were highly conserved across diverse animal   are more abundant than loop and star sequences. Additionally, miR-309 and miR-988 were not detected in the pupae and unfed adult females libraries, miR-iab-4 was not found in eggs and pupae libraries and miR-308 was not identified from the larvae library. These results indicate that the expression of some miRNAs are probably stage specific. It can be speculated that a miRNA may be involved in regulation of function and dysfunction, differentiation, growth and development of a specific stage [73]. The identification of novel miRNAs is an eminent and challenging problem for the understanding of post-transcriptional gene regulation. The characteristic hairpin structure of miRNA precursors can be used to predict novel miRNAs. With this feature and using miRDeep2, novel miRNAs were predicted by exploring the secondary structure, the Dicer cleavage site, and the minimum free energy of the unannotated small RNA tags that could be mapped to the An. funestus genome. Based on the analysis, 43 mature sequences were identified as novel miRNA candidates because they were not captured in any RNA database. The full precursor structure for some of these novel miRNAs was identified. Detection of the miRNA star is a strong clue, albeit not infallible, for the formation of precursor hairpin structures. This adds weight to the authenticity of the predicted candidates [74,75]. However, the evolution and function of the star miRNAs remains unclear. Two studies proposed that these star miRNAs might differ from their sense partners by acting on different mRNA targets [76,77].
The miRBase databases release 21 (June 2014) was searched for homologs to determine whether these novel miRNAs are conserved among other animal species. Some candidates are conserved in other insect species but not in the Anopheles genus, suggesting that these are insect-specific miRNAs. Among the newly identified Anopheles miRNAs, four mosquito specific miRNAs (miR-2940, miR-2941, miR-2942 and miR-2943) were detected [68]. This is the first description of miR-2941 and miR-2943 in the anopheline species using the highthroughput sequencing technology.
The NGS technology was sensitive enough to detect a new variant for miR-927 and miR-980, miRNAs found only in insects. These results are congruent with the existence of multiple variants for both miRNAs in other insects such as Drosophila [18]. A third stem-loop precursor for miR-2 was also detected which produces different mature miR-2. The miR-2 family is widespread in invertebrates, and it is the largest family of miRNAs in the model species Drosophila melanogaster (8 family members), Capitella teleta (7 family members), Apis mellifera (6 family members), Bombyx mori (5 members) [20].
Counting redundant miRNA reads revealed that expression varies significantly among different miRNAs. In the three pre-adult stages, insect-specific miRNA (miR-263a) was found to be the most abundant miRNA. However, miR-8 was the common miRNA in the unfed adult females library. Furthermore, the four libraries shared five out of the top ten most frequently occurring miRNAs: miR-263a, miR-10, miR-184, bantam and miR-8. In Drosophila, miRNA miR-263a confers robustness during development by protecting nascent sense organs from apoptosis [9]. Moreover, a recent study on Nilaparvata lugens reported that miR-263a was found as high abundant miRNAs in the last instar female nymph females [78]. Both miR-8 and miR-184 were reported in the embryos of Drosophila [79,80], mosquitoes [68,81], silk worm [82], Schistosoma japonicum [83], zebrafish [84], Asian seabass [85], mouse [86][87][88] and humans [15,89]. This suggest a conservative developmental function for these two miRNAs across different animals. High number of reads for miR-2941, miR-2943 miR-2944a, and miR-2944b in the eggs library were also observed, which indicate an embryonic roles for these insect specific miR-NAs. High expression for miR-2941 and miR-2943 in the embryonic stage was reported previously [53,90].
The expression of miRNAs varies across different developmental stages [82,91]. In this study, bantam, miR-8, miR-31 and miR-278 were among the up-regulated miR-NAs between the egg and larva stage. In D. melanogaster, overexpression of bantam induces tissue overgrowth. Related to growth, and also in D. melanogaster, miR-8 and miR-278 have been implicated in insulin receptor signaling , thus contributing to regulation of the energy balance [6]. Embryos depleted for miR-31 can complete development, but were affected by severe segmentation defects [92]. In the gregarious phase of locust, canonical miRNAs were expressed at levels between 1.5-and 2-fold higher than in the solitary phase. The most prominent differences were found in miR-276, miR-125, and miR-315 [34]. Interestingly, the same change in these miRNAs between the egg and the larva stage except for miR-315 were observed. As is known, the genes that encode for miRNA are distributed across the chromosome either individually, or in clusters in which two or more miRNA genes are located within a short distance on the same segment of a chromosome [93,94]. Therefore, it is assumed that miRNA genes located in a gene cluster are first transcribed as a single primary transcript that is subsequently processed to generate the individual miRNAs [7]. In the larvae library, very high relative expression levels of the let-7-complex locus (miR-125, let-7 and miR-100), the cluster of the two mosquito-specifc miRNAs; miR-1174 and miR-1175, miR-278 and miR-307, miR-305 and miR-275, miR-210 and miR-927 were observed. Down-regulation of the miR-92 cluster (miR-92a and miR-92b) and miR-309 cluster (miR-309 and miR-286) were also observed. Similarly in Drosophila, miR-309 and miR-286 (which was processed from a single 1.5 kb primary transcript) displayed a dynamic pattern of expression in the early embryo [95].
Between the larva and the pupa, miR-193, miR-277 and miR-1890 were significantly up-regulated. miR-34, miR-308, miR-375, miR-1175 and and miR-2942 were down-regulated in the pupa after increasing in the larva stage. Again, down-regulation of the miR-92a cluster was noticed. The expression of the members of this cluster has been related to the embryonic development in Ae. aegypti [68] and B. mori [69].
Among the up-regulated miRNA between the pupa and the unfed adult female is miR-1891 the mosquito-specific miRNA which displayed changes in its expression levels in the unfed adult females library, suggesting a significant role for this miRNA during development beside its function in the host response to infection [14]. In the unfed adult females library, up-regulation of miR-34 was noticed compared to significant down-regulation in the pupae library, and a major change in miR-989 expression for the first time. The expression of these miRNAs has been studied in different fly species, and results have shown that miR-989 expression is restricted to females, and predominantly to the ovary in An. anthropophagus, An. stephensi, Ae. aegypti and Drosophila [52,53,68,96], although it was later detected in the midgut of An. gambiae [7]. Both miRNAs (miR-34 and miR-989) with two other miRNAs displayed changes in the expression levels during Plasmodium parasite invasion [7]. For unknown reasons, some miRNAs such as miR-87 and the arthropod-specific miR-929 showed low read counts in the four life stage libraries, suggesting that miRNAs might be involved in other process but not development. Further studies are needed to reveal the function of these miRNAs.
The predicted novel miRNAs exhibited much lower expression levels, consistent with the evidence that nonconserved miRNAs are often expressed at a lower level than conserved miRNAs [97][98][99].
It is important to note that the expression profile analysis in this study are based only on the sequence read counts and required further experimental validation. However, this result is a key step towards improving our understanding of the complexity and regulation mode of miRNAs in mosquitoes. Changes in the expression profiles were noted in all stages indicating a role for these small RNAs in the mosquito maturation. Knockdown or blocking the biogenesis pathway of one of these miRNAs may limit the mosquito's development at a crucial stage thereby leading to novel approaches to combat this mosquito in the early development stages.

Conclusion
In this study, 107 mature miRNA sequences in the four developing stages (eggs, larvae, pupae, and unfed adult females) of An. funestus were identified, one of the most prevalent malaria vector on the African continent using the high throughput sequencing and described their expression patterns during development.