A new fast algorithm for solving the minimum spanning tree problem based on DNA molecules computation

The minimum spanning tree (MST) problem is to ﬁnd minimum edge connected subsets containing all the vertex of a given undirected graph. It is a vitally important NP-complete problem in graph theory and applied mathematics, having numerous real life applications. Moreover in previous studies, DNA molecular operations usually were used to solve NP-complete head-to-tail path search problems, rarely for NP-hard problems with multi-lateral path solutions result, such as the minimum spanning tree problem. In this paper, we present a new fast DNA algorithm for solving the MST problem using DNA molecular operations. For an undirected graph with n vertex and m edges, we reasonably design ﬂexible length DNA strands representing the vertex and edges, take appropriate steps and get the solutions of the MST problem in proper length range and O (3 m + n ) time complexity. We extend the application of DNA molecular operations and simultaneity simplify the complexity of the computation. Results of computer simulative experiments show that the proposed method updates some of the best known values with very short time and that the proposed method provides a better performance with solution accuracy over existing algorithms.


Introduction
DNA computing is a newly emerging crossdisciplinenary science that uses DNA molecular biotechnologies to solve conundrum problems of computer science and computational mathematics. Adleman (1994) presented that DNA molecule can be used to solve the directed Hamiltonian path problem of size n in O(n) steps, and also proved the potential parallel power of DNA computation. The advantage implied that we can utilize DNA molecule to solve harder, larger problems such as NP-complete problems in linearly increasing time, in contrast to the exponentially increasing time required by an electronical computer. Lipton (1995) demonstrated that Adleman's experiment could be used to figure out the NP-complete satisfiability (SAT) problem. In recent years, DNA computation has received considerable interest from researchers. There are three major advantages of the DNA computing: massive parallelism, enormous memory storage and very low energy consumption. Some typical DNA computing models, such as Adleman-Lipton model (Adleman, 1994;Lipton, 1995), the sticker model (Roweis et al., 1998), the restriction enzyme model (Ouyang et al., 1997), the self-assembly model , the hairpin model (Sakamoto et al., 2000) and the surface-based model (Smith et al., 1998), have already been established. Based on these models, Lots of papers have occurred for designing DNA procedures and algorithms to solve various NP-complete problems Xiao et al., 2006;Wang et al., 2008Wang et al., , 2012Lee et al., 2004;Guo et al., 2005;Chang, 2007;Chang et al., 2008Chang et al., , 2012Han, 2008;Liu et al., 2010;Narayanan et al., 1998). In order to fully understand the power of biological computation, it is worthwhile to try to solve more kinds of computationally intractable problems with the aid of DNA operations. Moreover, many previous research works are about optimal path search problems or set division problems Xiao et al., 2006;Wang et al., 2008Wang et al., , 2012Lee et al., 2004;Guo et al., 2005;Chang et al., 2008Chang et al., , 2012Chang, 2007;Han, 2008;Liu et al., 2005Liu et al., , 2010Narayanan et al., 1998;Garey and Johnson, 1979;Jonoskas, 1998;Zimmermann et al., 2008;Han et al., 2008;Braich et al., 2001Braich et al., , 2002Zhang and Liu, 2011;Majid, 2011;Alberto et al., 2009;Bakar et al., 2008;Bondy, 1976;Yao et al., 2008;Chen and Zhang, 2000;Han and Zhu, 2006;Yamamura et al., 2002). For example, Lee et al. (2004) first designs different length's strands representing paths values and cities, takes molecular operations to generate strands standing for all possible paths, then uses biochemical techniques, such as denaturation temperature gradient polymerase chain reaction and temperature gradient gel, to get the optimum solutions of the traveling salesman problem. To solve the shortest path problem, Narayanan et al. (1998) respectively carries out DNA reaction to get the strands for a list of series paths, then chooses the shortest length strands as the solution through DNA biotechnologies. The previous researches have some insufficient factors. One is that the strands for the possible paths are usually very long, while too long DNA strands can lead to error-prone in annealing and separation procedures using modern biotechniques. The other is that many previous disquisitive NP-complete problems based on DNA computation are path search problems, which the optimum solutions are vertex hail-to-end paths, such as the Traveling Salesman Problem, Shortest Route Problem, Hamilton Path Problem and so on. While in solution space paths of the minimum spanning tree problem, one vertex may not just point to one vertex but maybe more than one. So expressing vertex one-to-many paths by DNA strands is an important issue towards extending the capability of DNA computing to solve many optimization problems.
The minimum spanning tree problem can be described: Given an undirected and no-loop graph G = (V, E, C) with a vertex set V = {v 1 , v 2 , . . ., v n } and edges set E = {e i,j |1 ≤ i, j ≤ n}, Let|V| = n, |E| = m, c i,j ∈ C is the weight of edge e i,j . For T = (V, E ) (E ⊆ E) is a spanning tree if and only if T is a connected graph with all the vertex of G and (n − 1) edges, and the E total weight is equal to e i,j ∈E c i,j . The minimum spanning tree problem is to find a spanning tree T of graph G such that T has the minimum weight. For instance, the undirected graph G in Fig. 1 defines such a problem. It is not difficult to find that the edges subset {e 1,2 , e 2,4 , e 2,5 , e 3,4 } is the solution to the minimum spanning tree problem for graph G in Fig. 1. Garey and Johnson (1979) has shown that the minimum spanning tree problem is NP-complete. The minimum spanning tree problem is a problem of central importance in graph theory and computational Sciences and also plays an important role in parallel processing. As a result, various heuristic algorithms have been devised for the MST problem. Now it can be solved by Prim algorithm in O(n 2 ) time and Kruskal algorithm in O(mlog 2 m) time. But with the scale expansion of graph, it is intractable to solve. Liu et al. (2005) use 3-dimensional (3D) DNA structures to represent the different vertex in the graph, design number of the hydrogen bonds in the edges strands in connection with edges weights. For melting temperature of DNA sequences can be influenced by the G/C contents (the more G/C contents in DNA sequences, the lower melting temperature), and the melting temperature influence Polymerase Chain Reaction (PCR). They design the amount G/C contents of DNA sequences related to edges weight (edges with small weights have more G/C contents). In the solution space sequences, the DNA sequences with maximum quantity sum of G/C contents is the result of minimum spanning tree problem. The method has two disadvantages: One is that it cannot be used complex graph with vertex degree more 3 for 3D DNA structures is unstable and can not easy to generate (Jonoskas, 1998). The other is that it is hard to meet the weight value strictly corresponding to the number of hydrogen bonds in the experiment strands. Taking the Table 4 in Liu et al. (2005) for example, weight values 20 ( v 0 v 3 , v 3 v 0 ) can be denoted by different number Hydrogen bonds (86 and 88), there is inevitable having small error to affect accuracy of computing. Zimmermann et al. (2008) and Han et al. (2008) attempted using the DNA strands with actual weights length to represent the edges information. But in their algorithms, they ignored that edge subsets having a loop with minimum weight sum may be improperly chosen as the optimum solution. Such as the disconnected edges subset {e 1,2 , e 1,5 , e 2,5 , e 3,4 } in the Fig. 1 of author's paper is inaccurate choice for it is not a spanning tree, But it is not can be distinguished in above algorithms for lacking of judgement on loop. In this paper, based on a combination of Adleman-Lipton model and the DNA molecule sticker model is introduced for figuring out solutions of the minimum spanning tree problem.
The rest of this paper is organized as follows. In Section 2, the Adleman-Lipton model is introduced in detail. Section 3 uses a DNA molecular algorithm for solving the minimum spanning tree problem. Section 4 proved DNA algorithm complexity and feasibility. In Section 5, We use computer to simulate the DNA experiment and get correct solution of the Fig. 1, furthermore, a relatively complex example of the minimum spanning tree problem was given and corresponding simulant results were described. We get conclusions in Section 6.

The Adleman-Lipton model
Deoxyribonucleic acid or DNA plays the role of memory in nature. DNA is the genetic material containing the whole information of an organism to be copied into the next generation of the species. DNA-based computing, or more generally molecular computing, is a computational paradigm that uses synthetic DNA molecules as information storage media. Bio-molecular computers work at the molecular level. Because biological and mathematical operations have some similarities, DNA, the genetic material that encodes for living organisms, is stable and predictable in its reactions and can be used to encode information for mathematical systems.
The DNA is a long polymer formed by units called nucleotides that connect among themselves by four different types of molecules called bases: adenine (A), cytosine (C), guanine (G) and thymine (T). To the context of this work, a nucleotide and its corresponding base are considered as the same element. To form the DNA sequences, the nucleotides are joined among them by phosphate groups C bonds C that are asymmetric with respect of the geometry of each other, and they are referred as the 3 and 5 ends. The DNA double helix structure comes as a result of the annealing of complementary bases (A with T and C with G). The reverse process C melting C separate the double helix into two bases sequences. Moreover, a sequence of pieces of DNA is composed by genes. e.g., the singled strands 5 CTGCAGTACACC3 and 3 GACGTCATGTGG5 can form a double strand. We also call the strand 3 GACGTCATGTGG5 as the complementary strand of 5 CTGCAGTACACC3 and simply denote 3 GACGTCATGTGG5 by 5 CTGCAGTACACC3 . The length of a single stranded DNA is the number of nucleotides comprising the single strand. Thus, if a single stranded DNA includes 15 nucleotides, it is called a 15 mer. The length of a double stranded DNA is counted in the number of base pairs. Thus, if we make a double stranded DNA from a single stranded 15 mer, then the length of the double stranded DNA is 15 base pairs, also written as 15 bp.
The DNA operations proposed by Adleman (1994) and Lipton (1995) are described below. These operations will be used for figuring out solutions of the minimum spanning tree problem in this paper. The Adleman-Lipton model: A (test) tube is a set of molecules of DNA (i.e., a multi-set of finite strings over the alphabet {A,C,G,T}). Given a tube, one can perform the following operations: (1) Merge (T 1 , T 2 ): for two given test tubes T 1 , T 2 , it stores the union T 1 T 2 in T 1 and leaves T 2 empty; (2) Copy (T 1 , T 2 ): for a given test tube T 1 , it produces a test tube T 2 with the same contents as T 1 ; (3) Detect (T): given a test tube T, it outputs "yes" if T contains at least one strand, otherwise, outputs "no"; (4) Separation (T 1 , X, T 2 ): for a given test tube T 1 and a given set of strings X, it removes all single strands containing a string in X from T 1 , and produces a test tube T 2 with the removed strands; (5) Selection (T 1 , L, T 2 ): for a given test tube T 1 and a given integer L, it removes all strands with length L from T 1 , and produces a test tube T 2 with the removed strands; (6) Sort (T 1 , T 2 , T 3 ): for a given test tube T 1 , it choose the shortest length strands in the tube T 2 , the longest strands in T 3 and the remaining strands in T 1 ; (7) Cleavage (T, 0 1 ): for a given test tube T and a string of two (specified) symbols 0 1 , it cuts each strand containing [ 0 1 ] in T into different strands as follows: (8) Annealing (T): for a given test tube T, it produces all feasible double strands in T. The produced double strands are still stored in T after annealing; (9) Denaturation (T): for a given test tube T, it dissociates each double strand in T into two single strands; (10) Ligation (T): for a given tube T, the operation is used to ligate together the strands in T; (11) Discard (T): for a given test tube T, it discards the tube T; (12) Read (T): for a given tube T, the operation is used to describe a single molecule, which is contained in the tube T. Even if T contains many different molecules each encoding a different set of bases, the operation can give an explicit description of exactly one of them; (13) Append-head (T, Z): for a given test tube T and a given DNA singled strand Z it appends Z onto the head of every strand in the tube T; (14) Append-tail (T, Z): for a given test tube T and a given DNA singled strand Z it appends Z onto the end of every strand in the tube T.
Since these fourteen manipulations are implemented with a constant number of biological steps for DNA strands (Shin et al., 1999), we assume that the complexity of each manipulation is in O(1) time steps.

DNA algorithm for the minimum spanning tree problem
For a given undirected graph G = (V, E), V = {v k |k = 1, 2, . . ., n} is vertex set, E = {e i,j |1 i, j n} is edges set and |E| = m. Some vertex v i and v j can be connected by the edge e i,j in graph G with the positive integer weight c i,j . At the same time, the graph processed in this paper has no self-loops.
In the following, the symbols s, e, A k (k = 1, 2, . . ., n) denote distinct DNA singled strands with same length, say t mer (t is a positive integer). Obviously the length t of the DNA singled strands greatly depends on the size of the problem involved in order to distinguish all above symbols (Zimmermann et al., 2008). Meanwhile we use the symbols w i,j , w i,j to denote the edge e i,j and ||w i,j || = c i,j . Then in the below operations, we use the distinct DNA singled strands symbols sA i A j e, sA j A i e (1 i, j n) to denote the edge e i,j without weight information. Simultaneity the symbol s, e is the signal of different edges division. Let

R = {∅}
For a graph with n vertex and m edges, every possible subset of the edges subset E can be expressed by a list of DNA strands. DNA strands with sA i A j e or sA j A i e represent the edge e i,j in the subset, and without sA i A j e and sA j A i e represent the edge e i,j out of the subset. For example in Fig. 1, the edges subset {e 1,2 , e 2,4 , e 3,4 , e 4,5 } can be expressed by the DNA strands In this way, we transform all possible edges subsets of E for different DNA strands. We call this the data pool.
(1) We choose all possible edges subsets of graph G.
For k = 1 to k = |E| = m (1-1) Copy(R, T 1 ); (1-2) Append − tail(T 1 , sA i A j e); (1-3) Merge(R, T 1 ); (1-4) Discard(T 1 ). End for After the above steps of manipulations, the singled strands in tube R will encode all possible subsets of edges. For example, for the graph in Fig. 1, we have singled strands: which denote the subset of edges {e 1,2 , e 1,5 , e 2,3 , e 2,4 , e 4,5 }. The number of edges in the subset is m, so this step operation can be finished in O(m) time steps since each manipulation above works in O(1) steps.
(2) Each singled strand in tube R denotes one possible edges subset. The minimum spanning tree problem is firstly required the solutions that all vertex of the graph should be included in the edges subset. So we should check all the edges subsets whether to satisfy the above condition. If v k ∈ E in graph, we should discard the strands which don't contain the symbol A k . For example in Fig. 1, the singled strands (representing the subset of edges ({e 1,2 , e 1,5 , e 2,3 , e 2,5 }) should be discarded for not including the vertice v 4 in graph G. We choose all possible subset strands as bellow: For k = 1 to k = n (2-1) Separation(R, A k , T 2 ); (2-2) Discard(R); (2-3) Copy(T 2 , R); (2-4) Discard(T 2 ). End for After the above operations, the singled strands in tube R are edges subsets containing all the vertex of graph. Meanwhile we use one "For" clauses, thus this operation can be finished in O(n) time steps since each single manipulation above works in O(1) steps.
(3) The solution to minimum spanning tree problem must be a edge-connected subset. So the solution to MST problem is at least having (n − 1) edges for a graph with n vertex. Meanwhile it must have circuit if the number of edges in the subset is more than (n − 1). Therefore the solution of the minimum spanning tree problem is one and only having (n − 1) edges in the subset, that cannot be the optimum solution to the problem. We should discard the inappropriate strands. For example, for the graph in Fig. 1, the singled strands in R represent containing the 5 edges {e 1,2 , e 2,3 , e 2,5 , e 3,4 , e 4,5 } should be discarded for having a circuit Owing to at first we let ||s|| = ||e|| = ||A k || = t mer, then the DNA strands length with (n − 1) edges is (4n − 4)t. This is done by the following manipulations: In the above operation, this operation can be finished in O(1) time steps since each single manipulation above works in O(1) steps. (4) Through above manipulations, most of the strands denote spanning trees. But there still exits a class non-proper DNA strands in tube R which contain all the vertex symbol with (n − 1) edges. For example, for the graph in Fig. 1, the singled strands represent edges subset {e 1,2 , e 1,5 , e 2,5 , e 3,4 } should be discarded for that the edges subset is not connected and having a loop path. Nevertheless solutions to minimum spanning tree problem cannot have a edges-loop and must be a edges-connected graph. We should choose the strands containing edges-loop of graph G to discard. In previous DNA computing study (Liu et al., 2005;Jonoskas, 1998), they overlooked the existence of this situation. If we find a loop in the graph, the longest strands in the loop is not be selected as solution to the minimum spanning tree problem. So we automatically generate all possible loop path with weight-length DNA strands. We let P = {sA 1 w 12 A 2 e, sA 2 w 21 A 1 e, sA 1 w 15 A 5 e. . ., × sA 4 w 45 A 5 e, sA 5 w 54 A 4 e}, Q = {s, e, w i,j , A k , A k esA k |k = 1, 2, . . ., n}, For k = 1 to k = |E| = m (4-1) Merge(P, Q); (4-2) Annealing(P); (4-3) Ligation(P); (4-4) Denaturation(P); (4-5) Separation(P, {3 − sA i w i,j }, T 4 ); (4-6) Separation(T 4 , {A i e − 5 }, T 5 ); (4-7) Separation(T 5 , {w j,i A i }, T 6 ); (4-8) Discard(T 4 ); (4-9) Discard(T 6 ); (4-10) Sort(T 5 , T 7 , T 8 ); (4-11) Cleavage(T 8 , [es], T 9 ); (4-12) Sort(T 9 , T 10 , T 11 ); (4-13) Separation(R, {A i A j |T 11= e i,j }, T 12 ); (4-14) Discard(T 8 ); (4-15) Discard(T 9 ); (4-16) Discard(T 11 ). End for In the above operation, we use one "For" clauses, we discard non-proper strands at most m, thus this operation can be finished less in O(m) time steps since each single manipulation above works in O(1) steps. (5) The minimum spanning tree set problem should be a smallest weight edges subset which satisfy the above condition. So we append the weight value w i,j at the end of previous strands containing the edge e i,j . For example, for the graph in Fig. 1, the singled strands {sA 1 A 2 esA 1 A 5 esA 2 A 3 esA 4 A 5 e} ∈ R represent edges subset {e 1,2 , e 1,5 , e 2,3 , e 4,5 }, we append strands w 1,2 , w 1,5 , w 2,3 , w 4,5 at the above-mentioned strands to {sA 1 A 2 esA 1 A 5 esA 2 A 3 esA 4 A 5 ew 1,2 w 1,5 w 2,3 w 4,5 } This is done by the following manipulations: For k = 1 to k = |E| = m (5-1) Separation(R, A i A j , T 13 ); (5-2) Append − tail(T 13 , w i,j ); (5-3) Merge(R, T 13 ); (5-4) Discard(T 13 ). End for In the above operation, we use one "For" clauses, thus this operation can be finished in less O(m) time steps since each single manipulation above works in O(1) step. (6) We take out those singled strands in R with shortest length, which give the solutions to minimum spanning tree problem. For example, for the graph in Fig. 1, those singled strands in R with shortest length are {sA 1 A 2 esA 2 A 4 esA 2 A 5 esA 3 A 4 ew 1,2 w 2,4 w 2,5 w 3,4 } Therefore, solutions to minimum spanning tree problem for the graph in Fig. 1 are {e 1,2 , e 2,4 , e 2,5 , e 3,4 } with the weight sum 8.
(6-1) Sort(R, T 14 , T 15 ); (6-2) Read(T 14 ); In the above operation, this operation can be finished in O(1) time steps since each single manipulation above works in O(1) steps. Finally the Read operation is applied to giving the exact solutions to the minimum spanning tree problem.

The complexity and feasibility of the proposed DNA algorithm
The following theorems tell that the algorithm proposed above really can get solutions of the minimum spanning tree problem in O(3m + n) steps using DNA molecules. Theorem 1. The solutions of minimum spanning tree problems for a graph with n vertex and m edges can be obtained by the above DNA operations.

Proof.
We first get all combinations of the edges in the data pool after the first step. Because the spanning tree should be traversal all the vertex of graph G, we discard strands without some vertex information of graph at step (2). Simultaneity the spanning tree should have (n − 1) edges in the subset, we select the satisfactory strands at step (3). Furthermore we find false spanning tree strands at step (4) and use basic biological operations to remove illegal solution strands. In order to find the minimum solution, we append the edge weight strands at the end of previous strands at step (5). The shortest stands in the pool R means the solution to minimum spanning tree problem, and we can "read the answer at the final step. Theorem 3. The solutions strands of minimum spanning tree problems for a graph with n vertex and m edges can be founded in finite length range.
Proof. After the operations of four step, we discard the non-proper DNA strands for the minimum spanning tree problem. The singled strands in tube R denote all possible spanning tree this moment. Then strands can be described: In the beginning we reasonably design the length of s, e, A k , For ||s|| = ||A k || = ||e|| = t mer In order to choose the minimum spanning tree, we append the weight strands w i,j at the end of previous strands with the edge e i,j information. And we let ||w i,j || = c i,j mer and max||w i,j || = m mer. Then R can be described: So the length range of DNA strands in tube R is: ||S|| = ||s|| + ||A i 1 || + ||A j 1 || + ||e|| + ||s|| + ||A i 2 || + ||A j 2 || + ||e|| + · · · + ||s|| +||A i n−1 || + ||A j n−1 || + ||e|| + ||w i 1 ,j 1 || + ||w i 2 ,j 2 || + · · · + ||w i n−1 ,j n−1 || So the length of strands in R tube must be between (4n − 4)t and (4n − 4)t + m(n − 1). Accordingly we can get the solution at step (6) in appropriate length range.

Experimental results of simulated DNA computing
5.1. Simple example of the minimum spanning tree problem DNA-based computing counts on the biochemical operations of DNA molecules and may cause error when applying these biochemical operations. So sequence design is an important issue to make DNA-based computing more reliable. To have a better performance in hybridization reactions, we adapt the sequence design from Braich et al. (2001). such as Library sequences contain only Ts, and Cs; No probe sequence has a run of more than 7 matches with any 8 base alignment of any library sequence; and so on.
In this paper, We use BioPython, a python tool for computational molecular biology, as our developing platform for generating good DNA sequences which are suitable for executing our algorithms on laboratory. the Braich's program (Braich et al., 2001) and other simulations are running on a Windows XP machine, with an Intel Core-XP CPU and 4-GB main memory, and the compiler is Visual C++. The coded program is used to generate DNA sequences to solve the minimum spanning tree Problem and to construct the DNA sequences for every bit of the library. For the graph in Fig. 1, the program generates 5-base random sequences, consisting of s, e, A k and whether the library strands satisfy the above constraints when the new DNA sequences are added (Braich et al., 2001). If the generated DNA sequence fails to pass any of the constraints, the Table 1 Sequences chosen to represent s, e and A k (k = 1, 2, . . ., n) in the example for Fig. 1. s  TTCTT  e  TATCC  A1  CACTC  A2  ACCAT  A3  CTCAA  A4  ACTCC  A5 ATAAT w1,2 CTAAT w1,5 TAAAT w2,3 TATCA w2,4 TACTC w2,5 TAACA w3,4 CCACT w4,5 TCACT Table 2 Sequences chosen to represent the edges sA i A j e and sA j A i e in the example for Fig. 1.
TTCTTATAATACTCCTATCC Table 3 The energies for of binding each probe to its corresponding region on a library strand. program will regenerated a new DNA sequence. If the constraints are satisfied, the new DNA sequences are accepted. If all the DNA strands satisfy the constraints, the program has then succeeded and these sequences would be the outputs.
Consider the graph in Fig. 1, The graph includes five vertex: v 1 , v 2 , v 3 , v 4 and v 5 . DNA vertex sequences generated by the Braich program modified were shown in Table 1 and the edges sequence in Table 2. Braichs program is also used to calculate the enthalpy, entropy, and free energy for binding of each probe to its corresponding region on a library strand, while the energy used is shown in Table 3.
Our program also figured out the average and standard deviation for the enthalpy, entropy and free energy over all probe/library strand interaction. The energy levels are shown as in Table 4. Table 5   Table 4 The energies over all probe/library strand interactions.  Table 5 DNA sequences chosen to represent the answer of the minimum spanning tree problem.

Complex example of the minimum spanning tree problem
Consider another complex example of Fig. 2, The graph includes seven vertex: v 1 , v 2 , v 3 , v 4 , v 5 , v 6 and v 7 . DNA vertex sequences generated by the Braich program modified were shown in Table 6 and the edges sequence in Table 7. Braichs program is also used to calculate the enthalpy, entropy, and free energy for binding of each probe to its corresponding region on a library strand, while the energy used is shown in Table 8.
Our program also figured out the average and standard deviation for the enthalpy, entropy and free energy over all probe/library strand interaction. The energy levels are shown as in Table 9. Table 10 presents the library strands and the solution {e 1,2 , e 2,4 , e 2,6 , e 3,5 , e 5,6 , e 6,7 } of the minimum spanning problem. Table 7 Sequences chosen to represent the edges sA i A j e and sA j A i e in the example for Fig. 2. Fig. 2. An undirected graph G with 7 vertex and 12 edges.

Table 8
The energies for of binding each probe to its corresponding region on a library strand.

Conclusions
In this paper, we present DNA algorithms for solving the minimum spanning tree problem based on biological operations in the Adleman-Lipton model. Because electronic computers have obvious limits in storage, speed, intelligence, and miniaturization, the methods of DNA computation have arisen, especially for their efficient parallelism. The present algorithm has the following advantages compared with previous algorithms: Firstly, the proposed algorithm actually has a lower rate of errors for hybridization because we develop a computer program to generate good DNA sequences for generating the solution space of the minimum spanning tree problem. Secondly, Kruskal algorithm and Prim algorithm are considered efficient methods for the classic minimum spanning tree problems in fifty years Bondy (1976), respectively with O(mlog 2 m) and O(n 2 ) time complexity. Meanwhile we find that Yao et al. (2008) proposed a polynomial time algorithm for the minimum degree spanning tree problem in directed acyclic graphs. The algorithm terminates in O(mnlogn) time, where m and n are the number of edges and vertex of the graph, respectively. In addition, Chen and Zhang (2000) also gave a O(n 2 ) time algorithm for finding solutions of the minimum spanning tree problem. In our paper, the proposed algorithm requires a time cost that is linearly proportional to the instance size. It can finish in O(3m + n) time for the minimum spanning tree problem of an undirected graph with n vertex and m edges, faster and with less computational complexity than other previous algorithms. Thirdly, At step (4), we automatic generate the possible edges-loop with easier construction of the solution space. Especially some algorithms ignored the instance existing edges-loop with non-connected edges to get the false conclusion (Zimmermann et al., 2008;Han et al., 2008), such as the edges subset {e 1,2 , e 1,5 , e 2,5 , e 3,4 } in the Fig. 1. Besides the proposed algorithms can be easily performed in a fully automated manner in a laboratory. The full automation manner is essential not only for the speedup of computation but also for error-free computation. Meanwhile we simulated the DNA experiment to solve the minimum spanning tree problem. The ability to perform complex operations in solution might help us learn more about the nature of computation and lead to the development of better DNA based computation, capable of solving a wide range of complex problems. We hope that, in future studies, more highly effective DNA operations will be exploited to derive a DNA computing model with time efficiency and is complete for NP-hard problems.