S i M i S no RNA: Collection of siRNA, miRNA, and snoRNA database for RNA interference

: Objective: The discovery of sequence specific gene silencing which occurs due to the presence of double-stranded RNAs has considerable impact on biology, revealing an unknown level of regulation of gene expression. This process is known as RNA interference (RNAi) or RNA silencing in which RNA molecules inhibit gene expression, typically by causing the destruction of specific mRNA molecule. Two types of small RNA molecules-small interfering RNA (siRNA) and microRNA (miRNA) are central to RNA interference. Therefore, S i M i S no RNA data-base has been developed to focus on the efficient storage of RNA interferences or small RNA sequences and qualitative analysis of its structures across variety of organisms. Methods: S i M i S no RNA database is developed using WAMP server which implements Apache, MySQL, and PHPas principal components. Results: A flexible web-based search engine is developed to obtain fast access to specific small RNA sequence information. Conclusion: BLAST search analysis within S i M i S no RNA enables users to compare their own query sequence data with S i M i S no RNA database to retrieve related information. To facilitate data consistency, publicly available information from NCBI has been integratedinto database that can be conveniently used for research on the experimental and molecular biology.


Introduction
Currently there are few categorized databases covering available RNAi data. One particular database RNAiDB [1] was designed to provide access to results from RNAi studies in C. elegans. For siRNAs, two databases HuSiDa (serves sequences of published functional siRNA molecules targeting human genes) [2] and the MIT/ICBP siRNA database (a collection of siRNAs obtained from published data for human and mouse) [3] are available. Two significant miRNAs data resource databases, miRDB (for human, mouse, rat, dog and chicken) [4] and miRBase (contained information about 15,172 microRNAs) [5] were designed as a data collection of published miRNA sequences and annotation. The snoRNAs databases Sno/scaRNAbase (consisting of 1979 sno/scaRNA records) [6], snoRNA-LBME-db (snoRNAs of human H/ACA and C/D box) [7], and yeast snoRNA database (snoRNAs from the yeast Saccharomyces cerevisiae) [8] were developed to provide snoRNA associated data. However, to the best of our knowledge there is no specialized and dedicated database that covers data of all three important small RNA molecules (siRNAs, miRNAs, and snoRNAs) for diverse set of organisms which is located at one internet platform.
Concerning the significance of this database, it could be explained that small RNAs are extensively used as gene knock-down tool in the field of functional genomics, molecular genetics, and drug discovery. However, despite numerous efforts the design of potent siRNA (or small RNAs) remains still inadequate. Design rules resulting from different studies often disagree with each other, and are generally very often insufficient [9]. Typically, only about 75-80% siRNAs designed constructs based on current rules of design result in >50% knock-down efficacy. Currently, siRNA (24 organism and 2480 sequences), miRNA (101 organism, 1935 mature miRNA sequence, and 448447 target sequences), and snoRNA (20 organisms and 2690 sequence entries) records, originated from different studies are stored in our database. A link to PubMed or PubMed Central with a pre-formulated small RNAs query search is also made available in this database to allow the user to easily check for articles relating to the gene of interest in literature. This work describes development of an integrated database SiMiSnoRNA, which was designed to assist experimentalists in determining which small RNA can be used to inhibit their gene of interest. Moreover, this data will be increasingly useful for developing small RNA (siRNA) design tools. As far as from the current knowledge, there is no such internet platform available that includes different types of small RNA molecules and this database will be useful when users are interested in different types of small RNA data. We would like to maintain our database concerning new data and one of the future plan for this database is to include other remaining small RNA molecules (sRNAs, piRNAs, snRNAs, scaRNAs, and ncRNAs) [9].
Combinatorial analysis of small RNA molecules (siRNAs, miRNAs, and snoRNAs) scattered in diverse data resources is a considerable concept, but on the other hand rather demanding for experimental biologists to utilize them having limited bioinformatics experience. Therefore, there is a need to create a new user friendly platform to efficiently use such suspended data resources, and so SiMiSnoRNA database is developed in order to provide such platform.

Materials and Methods
SiMiSnoRNA platform was developed or designed to store data of three essential small RNA molecules (siRNA, miRNA, and SnoRNA) and provide online access to search or retrieve most of the small RNAs sequencesrelated toRNAi. The current version of SiMiSnoRNA database provides a user-friendly access to the small RNAs sequence data (sequence entries: siRNA=2480, miRNA=1935, and snoRNA=2690) which could benefit a broad range of experimental biologists. SiMiSnoRNA database supports users to browse data by selecting organism name. Additionally, on the homepage of database, (Figure 1) [10] advanced search options are also provided to search for the specific siRNA (by gene symbol, accession no., gene name, siRNA sequence), miRNA (by miRNA name, evidence, miRNA sequence, stem-loop sequence, genome context), and snoRNA (by snoRNA name, box type, target RNA, organization, locus) which could benefit a wide range of researchers focusing on molecular biology from genes to specific RNAi regions. To carry out advanced search, user can select organism name in first pull down menu and then specify search option in second pull down menu and input the keyword (related to specified search option) in text box to search specific small RNA in database ( Figure  2) [10]. BLAST (Basic Local Alignment Search Tool) [11,12] search option is also included in this database by which users can easily compare small RNA sequence (siRNA, miRNA, or snoRNA) of their interest in database and can retrieve related information if available in the database.

Database construction and content organization
The database contains information of small RNAs (siRNAs, miRNAs, and snoRNAs) which were already published in literature. Initially, we started our search for miRNA data in REFSEQ (a subset of the NCBI database) [13] by using appropriate keywords such as "miRNA", "microRNA", "miRNA and specific organism". In next step, we selected the "results by taxon" option and selected the organism one by one. The provided result page from this search gives miRNA and its information available in Nucleotide database of NCBI. From each obtained entry, research articles pertaining to specific miRNA information were collected from given PubMed ID [14]. Among number of peer-reviewed publications that were screened, research papers containing evidential experimental data were selected for entering the data in database. For collected miRNAs, the further search was made by miRNA names in other already available databases (miRDB and miRBase), to retrieve additional important data if available. The keywords used for siRNA and snoRNA to retrieve data in REFSEQ database [13], were "siRNA", "small interfering RNA", "siRNA and specific organism", "snoRNA", "small nucleolar RNA", "snoRNA and specific organism". HuSiDa, the MIT/ICBP siRNA database, sno/scaRNAbase, snoRNA-LBME-db, and yeast snoRNA databases were used to compare and validate our data collected from literature. For the data which were not found in already available databases for specific RNA type, we made further search in Rfam [15] and Ensemble [16] databases according to the sequence annotation. And to the data which we did not found in any database, we have provided a link to the Pubmed or Pubmed Central ID [14] which allows the user to easily check or verify data in its source literature. Apart from this, same procedure was followed to collect data for siRNA and snoRNA as it is described for miRNA.
Currently, experimental identification of miRNA targets is a time consuming process and as a result, most of the researchers rely on computational tools to identify a set of candidate targets for further experimental characterization. Three computational tools, such as TargetScan [17], RNAhybrid [18], and miRanda [19] were utilized to identify energetically favored hybridization (miRNA target) sites of small RNA within the conserved regions of 3'-UTR of genes in 10 metazoan genomes. The conserved regions were extracted from UCSC Genome Browser [20]. The predictive parameters in all three tools were set as default values. Each miRNA target prediction tool yields a set of miRNA target sites. However, some of them may be false positive predictions and in order to eliminate those candidates, the targets that have been predicted by at least two tools are selected and stored in SiMiSnoRNA database. Secondary structures are predicted for all sequences using RNAstructure (web servers for RNA secondary structure prediction) tool with default parameter values [21].
The current version of SiMiSnoRNA consist of three main small RNA moleculeswhich include siRNA (24 organism and 2480 sequences), miRNA (101 organism, 1935 mature miRNA sequence, and 448447 target sequences), and  snoRNA (20 organisms and 2690 sequence entries) ( Table  1). Data content of SiMiSnoRNA database provides information (genes, small RNA sequences, evidence, secondary structure, genome context, target sequences, and others) obtained from public resources ( Figure 5) [10]. SiMiSnoRNA provides search operation by basic (organism name) and advanced search for specific small RNA molecule of interest using flexible query engine. All miRNA target sequences give direct link to external Ensemble genome database for further analysis. Parameters of BLAST [12] are implemented for comparative studies and further sequence analysis ( Figure 5) [10]. Moreover, secondary structure and related information can be viewed by the visualization modules and can be downloaded easily ( Figure 6) [10].

System design and implementation
SiMiSnoRNAwas built on WAMP server which implements Apache (https://httpd.apache.org), MySQL (http://www. mysql.com), and PHP (http://php.net) as principal components. SiMiSnoRNA was constructed based on four major software components: PHP script and MySQL at the back-end and the HTML and CSSat the front-end. The web services were developed using WAMP server (http://www.wampserver. com) which automates the mapping between MySQL databases and objects in PHP scripting language, both of which guarantee the higher performance and stability of the web services. WAMP, Apache, MySQL, and PHP are used as these are open source software and are platform independent.

Database use and access
Search of RNAi data from different organisms across three small RNAs (siRNA, miRNA, and snoRNA) is provided in current version of the database. SiMiSnoRNA provides simple and user friendly interface to search data that allows three main accessing methods to query for a particular small RNA molecule: 'quick search', 'advanced search', and 'BLAST search' (Figure 3) [10]. All data stored in the database are listed for selected organism and for corresponding type of small RNA molecule on the 'quick search' page. Whereas, 'advanced search' allow for a dynamic query with the organism selection (i.e. for siRNA: gene name, symbol, accession, siRNA sequence/for miRNA: name, evidence, miRNA sequence/for snoRNA: name, box, target RNA, organization, locus). Different search option available for the users at SiMiSnoRNA include, BLAST tool which searches user provided query sequence against the sequences available in the database and it will be useful for characterization of unknown sequences to identify homologous sequences from the database. The rationale of SiMiSnoRNA database offers the unique opportunity to search information for unknown sequence and compare its data with three different types of small RNAs. Direct link to the valuable resources are also provided in each entry of the database to provide extended information to the users. Predicted secondary structure of all sequences has been provided which are generated using RNA structure program which uses thermodynamics and several  algorithms for secondary structure prediction (prediction of base pair probabilities, bimolecular structure prediction, and prediction of a structure common to two sequences) [21]. Analysis tools to explore sequence analysis and alignment:sequence comparison and alignment is very important during RNAi analysis. BLAST tool is implemented to compare sequences present in the database and retrieve important details for specific RNAi sequences. BLAST output page provide information for the best hits obtained from SiMiSnoRNA database, score (in bits), E-value, matrix details and alignment to corresponding sequences. Furthermore, secondary structures are also provided to understand the folding for these sequences. Predicted target sequences are also provided for most of the organisms. Most of the biological details presented in output page are cross-linked to other public repositories such as: NCBI Gene database, Nucleotide database, Pubmed/ Pubmed central and Ensemble.

Detailed SiMiSnoRNA output page
The information presented on the output page is specific to all three different types of small RNAs that covers RNA, miRNA, and snoRNA.For siRNAs: general information present on the output page includes organism name, gene accession, gene symbol, gene name, and siRNA sequence. Secondary structure can be viewed and downloaded for further structure analysis. Reference link has been given to the Pubmed / Pubmed central in order to get evidence from literature material for specific siRNAs.
For miRNAs:miRNAs information is divided in three different sections for animals, plants, and viruses. Output page of miRNA in animals includes organism name, miRNA name, sequence, sequence length, genomic location, locate on gene, evidence, sanger accession, miRNA targets, stem-loop-sequence, comments, secondary structure, and references (Pubmed ID links). Details for miRNA in plants involves, species name, miRNA name, evidence, miRNA sequence, target sequence, stemloop sequence, secondary structure, and references (Pubmed ID links). Output page of miRNA in viruses contains organism name, miRNA name, evidence, miRNA sequence, target sequence, stem-loop sequence, comments, genome context, secondary structure of miRNA sequence, minor miR* sequence and references (Pubmed ID links). miRNA target page for animals retrieves information of target name, Ensemble transcript ID, target start and end, target sequences from 3'-5' and 5'-3' and tool name used to predict.
For snoRNAs: SiMiSnoRNA page of retrieved information for snoRNAs consist of organism name, snoRNA name, accession no., box type, target RNA, organization, locus, snoRNA sequence, secondary structure, and Pubmed link to the research publication.
For the improvement of the database and to update it, online submission system has been prepared. The web page includes an interface to add the data and the data would be added to the SiMiSnoRNA database using PHP script. The data from researchers would be helpful to improve the content, quality and scope of the database. An online feedback form is also given to get the help for improvement of SiMiSnoRNA and users are encouraged to give error reports and requests through this form. To make SiMiSnoRNA user-friendly to the new users, help page is also provided.

Future perspectives
SiMiSnoRNA provides a large scaledata set for three different type of small RNA molecules (siRNA, miRNA, and snoRNA) and related biological information to them. The current release of SiMiSnoRNA is the first version of our database. Efforts will be made to regularly update (new Figure 6: Secondary structure of small RNA sequence and it is generated for all small RNA sequences stored in database using RNAstructure (web servers for RNA secondary structure prediction) tool. records for already present species or even new species), improve data stored in SiMiSnoRNA and also to improve its functionality. To make it an even more powerful resource, we are aimed to incorporate bioinformatics tools to facilitate data analysis and comparison. Moreover, relevant disease related information for specific RNAiwill be integrated as additional information in near future. We would like to encourage and invite scientific community members to submit the data of small RNA molecules to keep SiMiSnoRNA up-to-date. Additionally, efforts will be made to include other remaining small RNA molecules (sRNAs, piRNAs, snRNAs, scaRNAs, and ncRNAs) data from different organisms in our database. Continuous efforts in the improvement of BLAST search in the database and in the search for new data will be made to keep SiMiSnoRNA updated and comprehensive.