Optimized RNA structure alignment algorithm based on longest arc-preserving common subsequence

: Ribonucleic acid (RNA) structure alignment is an important problem in computational biology to identify structural similarity of RNAs. Obtaining an efficient method for this problem is challenging due to the high computational time for the optimal solution and the low accuracy of a heuristic solution. In this paper, an efficient algorithm is proposed based on a mathematical model called longest arc-preserving common subsequence. The proposed algorithm uses a heuristic technique and high-performance computing to optimize the solution of RNA structure alignment, both in terms of the running time and the accuracy of the output. Extensive experimental studies on a multicore system are conducted to show the effectiveness of the proposed algorithm on two types of data. The first is simulated data that consists of 450 comparisons of RNA structures, while the second is real biological data that consists of 357 comparisons of RNA structures. The results show that the proposed algorithm outperforms the best-known heuristic algorithm in terms of execution time, with a percentage improvement of 71% and increasing the length of the output, i.e., accuracy, by approximately 45% in all studied cases. Finally, future approaches are discussed.


Introduction
The field of optimization plays an important role in formulating many daily life problems in different fields of science, engineering, medicine, and biology.The main goal of optimization is to find the best possible or optimal solution by taking the maximization or minimization from all possible solutions under certain constraints.
Many techniques have been developed to solve optimization problems such as dynamic programming, greedy technique, integer programming, and metaheuristics.The major challenges for solving many optimization problems are (1) the high computation time that is required to find an optimal solution, and (2) the low accuracy of the approximate solution when manipulating large-size problems.Bioinformatics is a research field that has many problems that can be formulated as optimization tasks.Examples of optimization problems in bioinformatics are molecular docking [1], deoxyribonucleic acid (DNA) motifs [2][3][4], ribonucleic acid (RNA) structure comparisons [5,6], and RNA structure prediction [7].
We focus on the RNA structure comparison problem.The study of molecular similarity permits the classification of molecules into groups, estimation of their evolutionary history, identification of functional motifs, and thus prediction of their biological function [8].
RNA is a single-stranded polymer that consists of four different nucleotides-adenine (A), guanine (G), cytosine (C), and uracil (U).These nucleotides are connected by phosphodiester bonds.The shape of the RNA structure is determined when the RNA folds back on itself [1].In this case, a hydrogen bond exists between two interacting nucleotides and forms Watson-Crick (G-C and A-U) and wobble (G-U) base pairs [9].
In general, various computational models have been used to formulate the problem of RNA structure comparison, such as the tree [10][11][12], arc-annotated sequence (AAS) [13], and alignment-free [12,14].The research in this paper is interested in the algorithms that solve the RNA structure comparison based on AAS.In this case, the RNA structure comparison problem is defined as follows [13]: Given two RNAs as AASs, the goal is to determine the maximum common subsequence between two AASs under the condition that all arcs connecting the subsequence's nucleotides of RNA are preserved.This goal is named the longest arc-preserving common subsequence (LAPCS).
There are two special cases of LAPCS [15]: c-fragment LAPCS, and c-diagonal LAPCS.In cfragment LAPCS, is a LAPCS such that the fragment bases in the first sequence are only permitted to match fragment bases at the same location in the second sequence, where each sequence is divided into fragments, each of size c, except the last one, where  ≥ 1.In c-diagonal LAPCS, is a LAPCS, such that the base   is only permitted to match a base in the range  − , … ,  + , where  ≥ 0.
In AAS, many algorithms have been proposed based on different approaches [4,5,13,[16][17][18][19]: (1) Type of proposed algorithm: exact or approximation.(2) Type or level of RNA structure: Crossing, nested, chain, and plain (see Figure 1).(3) Type of platform used: Sequential or high-performance systems.Table 1 illustrates the different proposed algorithms to solve RNA structure alignment based on the LAPCS model.From Table 1, the following points are observed: (1) The running time for exact algorithms is non-polynomial in the general case of RNA structure level, crossing type.(2) No experimental studies for finding the optimal solution in the general case.(3) The running time for a heuristic solution requires high computational time in the general case.(4) The best-known heuristic algorithm for RNA structure comparison in the general case is the algorithm proposed by Blum and Blesa [5], named the BB algorithm.(5) All proposed algorithms are designed and implemented on a computer with a single processor and no parallel implementation for any algorithm based on LAPCS.The novelty of this research paper focuses on designing a new parallel heuristic algorithm for three goals.The first is reducing the running time of the best-known heuristic algorithm, BB.The second is increasing the accuracy of the output of the best-known heuristic algorithm, BB.The third is to the best of our knowledge, no previous parallel algorithms have been proposed based on LAPCS.To verify these goals, we used the high-performance computing methodology to design a parallel algorithm for solving RNA structure alignment problem.The developed algorithm is based on two levels of parallelism and implemented on a multicore system of 16 threads.The results show that the parallel proposed algorithm outperforms the BB algorithm from running time and accuracy perspectives.
The organization of the paper includes an introduction, four sections, and a conclusion.The mathematical and computer background of the RNA comparison structure problem is given in Section 2. The details of the proposed parallel algorithm are introduced in Section 3. In Section 4, the experimental configurations used in the experiments including simulated and real data are given.The results of the experiments are discussed and analyzed in Section 5. Finally, the conclusion of using the high-performance system in RNA comparison is given.

Mathematical background
In this section, a set of definitions and mathematical formula related to the LAPCS are given as follows.
Assume that the set of nucleotides of RNA is Σ = {A, C, G, U} and the set RNA structure type is  = {Crossing, Nested, Chain, Plain}.
A subsequence  ′ = ′ 1 ′ 2 … ′  of a string  =  1  2 …    is generated by deleting   −  symbols from .A common subsequence  of two strings A and B is a subsequence that appears in both strings and is formally represented as: A longest common subsequence (LCS) C of two strings A and B is a common subsequence between A and B and has a maximal length.Formally, The LCS for two substrings  1  2 …   and  1  2 …   is computed by the dynamic programming (DP) technique using the following equation.An arc-annotated sequence (AAS) is a pair (,   ), where (1) A is a string (sequence) of length   over Σ,  =  1  2 …    .(2)   is the set of arcs, where each arc represents an unordered pair of any two complementary nucleotides.Formally, the set   is defined as follows: = {(, ): 1 ≤  <  ≤   , and   and   are complementary}. (4) A common subsequence  of two arc-annotated sequences, (,   ) and (,   ), is named an arc-preserving common subsequence (APCS) if C is a subsequence for  and  and preserves all the arcs that link subsequence nucleotides.Formally, Eq (2) and The APCS with maximal length is named the longest arc-preserving common subsequence, LAPCS.Formally, The relation between two arcs is as follows [20].

The proposed method
The main purpose of this section is to describe in detail how to use the parallel concept to develop an efficient parallel algorithm based on the best-known algorithm for RNA comparison, the BB algorithm.The proposed algorithm named PBB.
The BB algorithm consists of many steps, where the main two steps are as follows.The first step is generating a number, nsol, of common subsequences, S, from the two strings A and B. The second step uses the Cplex tool [21] to find the APCS for the output of the first step based on the concept of the maximum independent set (MIS).The MIS can be determined by construct a graph G=(V,E), where (1) V is the set of matched pairs that are built from the common subsequences,  = { , = (, ): (, ) ∈ } , and (2)  = {( , ,   ′ , ′ ): (, ) and (′, ′)are breaking arc − preserving conditions, (, ) and (′, ′) ∈  }.Then, using integer linear program model, the problem can be formulated as a selection of a maximal number of non-conflicting binary variables [6].The two main steps are repeated many times which is based on the length of the input string A,   seconds.In each iteration, the algorithm updates the set of solutions and determines the best value of the solution by comparing the best old solution with the current values of the solution.Figure 2   The BB algorithm uses four parameters [5]: (1)   is the number of APCS constructions per iteration.(2)   is the maximum time allowed to find MIS using Cplex tool when the number of common subsequences is greater than or equal to nsol.(3)   and   are two parameters used in the process of generation of a random common subsequence.
The proposed algorithm is based on using two levels of parallelism as follows.Assume that the number of threads is k, and k=k1*k2.The reason for writing k as a product of integer numbers is to make the parallelism in two levels.The first level is based on parallelizing the generation of common subsequences from A and B. This means that when generating nsol common subsequences only as in the BB algorithm, the proposed algorithm generates k1*nsol common subsequences such that each thread generates nsol common subsequences.The second level is how to use parallelization to find APCS using the MIS method.
Additionally, in the BB algorithm, the two main steps are repeated many times based on the length of string A. Therefore, the proposed parallel algorithm reduces this repetition time to approximately   / 1 seconds.Also, the proposed PBB algorithm uses some parallel subroutines such as parallel maximal independent set, PMIS, algorithm [21], parallel binary tree technique [22,23], and parallel LCS, PLCS, algorithm [24,25].
The complete steps for the proposed PBB algorithm, PBB, are as follows.
Step 1: Set the values of parameters in a constant time,   ,   ,   and   , based on the length of A as suggested in [5], where A  B, see [5,Table 4].
Step 2: Generate the set of matched pairs between A and B using k threads.This step can be performed by dividing B into k substrings,   , of approximately equal size, ||/.Then each thread finds the set of matched pairs, Ri, between A and   .Finally, R is the union set of all matched pairs,  = ⋃    =1 , generated by k threads that can be computed by the parallel binary tree (PBT) paradigm.
Step 3: Find the LCS of A and B by applying the parallel LCS (PLCS) algorithm using k threads.The list L represents the set of match pairs of LCS of A and B.
Step 4: Apply the parallel MIS, PMIS, algorithm on list L using the Cplex tool and set the results to the current solution Sbest.
Step 5.2: Repeat the following steps until  is greater than or equal to ||/ 1 seconds: Step 5.2.1.:Repeat the following   iterations: Step 5.2.1.1:Generate a random common subsequence, say   , from the list R as in [5].
Step using  threads and the PBT paradigm.
Step 7: Apply the PMIS algorithm on   using k threads within time limit   and obtain the list .Output: LAPCS Note that in the case of using the PBT paradigm using k threads, see Steps 2 and 6.If the value of k1 is small, then the proposed algorithm will perform these steps sequentially.Also, Figure 3 shows the flow chart of the proposed parallel algorithm.

Experimental configuration
In this section, we describe the experimental setup used to implement the proposed parallel algorithm on a multicore system.The experimental setup includes the platform used in the implementation, data generation used in the comparison, and the number of threads used in each level of parallelism.
For the platform used in the implementation, the experimental studies are based on a multicore system that can execute 16 threads in parallel using a processor with a speed of 2.4 GHz and a memory capacity of 24 GB.The system works under the Linux operating system.All algorithms were programmed using the Java programming language.The Java thread features were used to implement the parallel region.Additionally, all compared algorithms used IBM ILOG CPLEX v12.8 [21] as a tool to find a good heuristic solution for the MIS problem in sequential and parallel cases For data generation, experimental comparisons between the algorithms are performed using two different datasets.The first dataset is generated as artificial data that is used for two purposes: (1) Evaluating the proposed algorithm compared to the best-known algorithm for RNA structure comparison, the BB algorithm; and (2) determining the best approach to assign 16 threads for the parallel case as two levels of parallelism.The second dataset is a real biological data for RNA structures that is used to ensure that the proposed parallel algorithm is also efficient for real data in terms of time and accuracy.
For artificial data, two parameters affect the generation of the RNA sequence: The sequence length, n, and the number of arcs, m.For fixed values of n and m, the system will generate an RNA sequence of length n containing m arcs such that the type of RNA structure is crossing.The set of values of n is {100, 200, 300, 400, 500}, while the set of values of m depends on n and is equal to n/2, n/5, and n/10.The sequence will be generated randomly and the appearance of each letter in the sequence is 1/4.
For fixed values of n and m, the running time of the compared algorithms is measured by taking the average value for 30 instances.Therefore, for a fixed value of n, the running time of the algorithm is the average of 90 instances because there are three values of m.Therefore, there are 5  90 = 450 comparisons of RNA structures.Additionally, the length of APCS is computed for each m to study the effect of the number of arcs on the performance of each algorithm.
To implement the parallel proposed algorithm, PBB, on a multicore system consisting of 16 threads, the number of threads, k = k1*k2, can be represented in different ways as follows: (1) 16 = 2  8, (2) 16 = 4  4, and (3) 16 = 8  2. In general,  1 and  2 are used for the first and second levels of parallelism, respectively.Therefore, the parallel algorithm can be represented as three parallel versions, PBB1, PBB2, and PBB3 for 16 = 2  8, 16 = 4  4, and 16 = 8  2, respectively.Thus, the PBB1 algorithm uses 2 threads in the first level of parallelism and 8 threads in the second level of parallelism.
For real data of RNA structures, experimental studies focus on different real data as, shown in Table 2.The Ribonuclease P RNA database contains a large number of RNAs; therefore, we selected 12 RNAs only.For Group I introns (Group A, B, and E), we selected all RNAs in the dataset.The details of each selected RNA, such as name, length, and number of arcs, are provided in Appendix A.
For each RNA group that consists of  RNAs, the total number of possible comparisons between each pair is given by: Therefore, the methodology used in the experimental study applies the two algorithms, sequential and parallel, to all pairs of RNAs.For example, the RNA database named "Group I introns (Group E)" contains 6 RNAs: M.anisopliae.4 In general, the experimental studies on real datasets include 357 comparisons to find the LAPCS.For each pair of RNAs in real dataset, two measurements are calculated.The first measure is the length of the APCS, while the second measure is the running time of executing the algorithm.

Results and discussion
In this section, the results and analysis of the experimental studies on artificial data and real data are discussed in the next two subsections.

Results of comparison on artificial data
The results of comparing four algorithms, one sequential and three parallel, are shown in Figure 4 and Table 3.The analysis of data results in the figure and table indicates the following.From the running time perspective, the results, in Figure 4, illustrate the following observations.First, the running time of all parallel algorithms is less than the running time of the sequential algorithm.For example, the running time for the BB algorithm is 335.6 second when n = 300, whereas the running times for the three parallel algorithms, PBB1, PBB2, and PBB3, are 186.3,111.5, and 82.0, seconds, respectively.Second, the PBB3 algorithm is faster than the PBB2 algorithm, and the PBB2 algorithm is faster than the PBB1 algorithm.For example, the running time for PBB3 algorithm is 127.4 seconds when n = 500, while the running times for the two other algorithms, PBB2, and PBB1, are 184.2 and 308.3 seconds, respectively.Third, the percentage of improvements, on average, for the three parallel algorithms, PBB1, PBB2, and PBB3, are 44.78%,66.97%, and 77.60%, respectively, where the percentage of improvement is measured by 1-Tpar/Tseq.Fourth, the average values of speed up for the parallel algorithms, PBB1, PBB2, and PBB3, are 1.8, 3, and 4.5, respectively, where the speed up is equal to Tseq/Tpar.
From the length of output viewpoint, the results in Table 3 illustrate the following observations.First, the length of the APCS generated by parallel algorithms is approximately equal to the length of the APCS generated by the sequential algorithm when n ≤ 200.For example, the average length of APCS, for 30 instances, generated by the four algorithms, BB, PBB1, PBB2, and PBB3, are 46.20,46.53, 46.60, and 46.77, respectively, when n = 100, m = n/2.Second, the length of the APCS generated by parallel algorithms is greater than the length of the APCS generated by sequential algorithm when n > 200.For example, the average length of APCS, for 30 instances, generated by the four algorithms, BB, PBB1, PBB2, and PBB3, are 166.9,170.9, 169.6, and 168.2, respectively, when n = 400, m = n/2.Third, the difference between the length of the APCS generated by parallel algorithms and the BB algorithm increases with increasing values of n.For example, the average difference between the length of the APCS generated by BB algorithm and PBB1 algorithm is 2.5 when n = 400, whereas the difference equal to 5.9 when n = 500.Fourth, the length of the APCS generated by the PBB1 and PBB2 algorithms is almost greater than that generated by the PBB3 algorithm.
A non-parametric statistical test known as the Wilcoxon signed-rank test [27] was employed to ascertain whether there exist statistically significant variations in the length of the output for the four algorithms, BB, PBB1, PBB2, and PBB3.The significant level used in the test is equal to 0.05.The results of implementing the test on each pair of algorithms, six pairs of algorithms, show the following observations (see additional file "math-09-05-550-supplementary").( 1) There was a significant difference between all parallel algorithms, PBB1, PBB2, and PBB3, and the sequential algorithm, BB; except in one case, there is no significant difference between BB and PBB1 when n = 200.(2) In the case of the two algorithms, PBB1 and PBB2, the PBB2 algorithm is better than the PBB1 algorithm when n = 200 and 300, while the PBB1 algorithm is better than the PBB2 algorithm when n = 400.Otherwise, there is no significant difference between the two algorithms.(3) In the case of the two algorithms, PBB1 and PBB3, the PBB3 algorithm is better than the PBB1 algorithm when n = 200, while the PBB1 algorithm is better than the PBB3 algorithm when n = 400 and 500.Otherwise, there is no significant difference between the two algorithms.(4) In the case of the two algorithms, PBB2 and PBB3, the PBB3 algorithm is better than the PBB2 algorithm when n = 200, while the PBB2 algorithm is better than the PBB3 algorithm when n = 400 and 500.Otherwise, there is no significant difference between the two algorithms.
From the memory required by each algorithm viewpoint, Table 4 illustrates the values of the memories in GB.The results show the following observations: First, the memory required by the BB algorithm is less than all parallel algorithms, PBB1, PBB2, and PBB3.Second, the memory required by the PBB1 algorithm is less than that required by the PBB2 algorithm, and the memory required by the PBB2 algorithm is less than that required by the PBB3 algorithm.The memory required for the PBB3 algorithm is high compared to other parallel algorithms due to the manipulation of 8 APCS simultaneously using the Cplex tool, while the two other algorithms manipulate 4 and 2 APCS.As a result, from the analysis of previous data, the parallel algorithms PBB1 and PBB2 have good performance from the length of output measurement compared to the other algorithm.Additionally, the PBB2 algorithm has better performance than the PBB1 algorithm from a running time perspective, which is more important than memory because the amount of storage is not high for all parallel algorithms.Therefore, the parallel algorithm PBB2 was selected to evaluate the parallelization of the BB algorithm for the real dataset as in the next subsection.

Results of comparison on real data
In this subsection, a comparison between the sequential algorithm and the selected parallel algorithm, PBB2, is performed to verify that the parallelism enhances the sequential algorithm from the points of view of the length of the output and running time.Table 5 shows the results of two measurements, time and length of LAPCS, for two algorithms, BB and PBB2, on four datasets of real RNAs.For the running time measurement, Table 5 shows the running time of both algorithms and the percentage of improvement in the running for the proposed parallel algorithm PBB2 compared to BB algorithm.For example, the two algorithms, BB and PBB2, were run on the Ribonuclease P RNA dataset and obtained the following results.(1) For 66 cases, the average running times of the BB and PBB2 algorithms are 398.1 and 107.5 seconds, respectively.(2) The PBB2 algorithm outperforms the BB algorithm with a percentage of improvement of 73%.Additionally, the running times for BB algorithm on the two dataset, Group A and B, are higher than the other dataset because the two datasets contain RNA with length greater than 1000 (see Appendix A).
For the length of APCS measurement, Table 5 displays the range of variation between the PBB2 and BB algorithms' outputs for the length of APCS measurement, as well as the percentage of cases where the PBB2 algorithm's generated LAPCS is longer (or equal to) the BB algorithm's generated LAPCS.For example, the two algorithms, BB and PBB2, were run on the Group i introns (Group A) dataset and obtained the following results: (1) The length of LAPCS generated by the PBB2 algorithm is greater than or equal to the output of the BB algorithm, with a difference from 0 to 7. (2) In 45.4% of the comparison cases, both algorithms generate the LAPCS with the same lengths.On the other hand, the PBB2 algorithm generates LAPCS with a length greater than that generated from the BB algorithm, with a percentage of 54.6%.Additionally, the difference between PBB2 and BB algorithms is sometimes large, such as in the Group i introns (Group B) dataset, where the maximum difference is 10.(3) The results of measuring the coefficient of variation (CV) of both algorithms for the length of LAPCS is almost equal.(4) The PBB2 algorithm has a significant difference compared to BB algorithm when we use Wilcoxon signed-rank test for all cases, except one case when the dataset is Group i introns (Group E).
On average, in all cases, the PBB2 algorithm outperforms the BB algorithm in terms of running time, with an improvement of approximately 71%.Additionally, the PBB2 algorithm generates LAPCS with a length greater than that generated by the BB algorithm, with at least 1 in 45.8% of the cases.

Conclusions
Identifying the similarity structure between two RNA structures is challenging in bioinformatics due to the high computational time required to find an optimal solution.In this paper, the RNA structure is represented as the longest arc-preserving common subsequence model.Then high-

Figure 1 .
Figure 1.RNA structure levels.(a) Plain: No arc in the sequence.(b) Chain: Any two arcs are not nested and not crossed.(c) Nested: At least two arcs are nested; and no two arcs are crossed.(d) Crossing: At least two arcs are crossed.

Figure 2 .
Figure 2.An example for executing BB algorithm.

Figure 4 .
Figure 4. Running time for compared algorithms on simulated data.

Table 1 .
RNA structure comparison algorithms based on LAPCS*.

Table 2 .
Real dataset used in the experiments.

Table 3 .
Length of APCS, average value, for the compared algorithms using different n and m.

Table 4 .
Comparison between different algorithms based on memory requirements in GB.

Table 5 .
Comparison between the BB and PBB2 algorithms for a real dataset.