DNA computing based RNA genetic algorithm with applications in parameter estimation of chemical engineering processes
Introduction
Since Adleman (1994) first introduced DNA computing by solving a computationally hard problem of the directed Hamiltonian path problem, many groups have worked on different NP hard problems with fewer variables, such as maximal clique problem (Ouyang et al., 1997), 3-SAT problems (Braich et al., 2002), DES deciphering problem (Boneh, Dunworth, & Lipton, 1995), traveling salesman problems (Lee et al., 2004), etc. Conventionally, Adleman-style DNA computing consists of three major steps: (1) generate a data pool of DNA molecules that represent all possible solutions to the studied problem, (2) utilize a series of biology laboratory techniques to exclude the DNA strands that do not match the logic constraints of the problem, (3) collect the surviving DNA molecules for the answer readout process. According to the above steps, DNA computing requires that the size of initial data pool increase exponentially with the number of variables in calculation, so this kind of DNA computing method is a brute force method. Genuinely, the difficulty is not the absence of correct strands after computing, but the presence of vast contaminating DNA. In order to break the barrier of this brute-force method and implement the DNA operations with an existed digital computer, various improved DNA computing methods and electronic DNA computing algorithms have been studied. Yang and Yang (2005) modified a well-known sticker model to build solution sequences in parts satisfying one clause in a step, and eventually solved the whole Boolean formula after a number of steps. Yamamura et al. (2002) proposed a local search method based on DNA concentration computing to solve the shortest path problem. Because laboratory experiments in DNA computing are highly difficult, inefficient, un-scalable and expensive compared to conventional computing standard, most of improved DNA computing methods are carried out theoretically. Hence, Garzon et al. (1999) described an electronic DNA (EDNA) to simulate a virtual test tube with digital computer and reproduced Adleman's experiment. Hartemink et al. (1999) simulated biological reactions of DNA computing and implemented a simulator called CYBERCYCLER. Ouyang and co-workers proposed a genetic DNA computing algorithm to solve the maximal clique problem, which was possible to get a solution from a very small initial data pool and avoided enumerating all candidate solutions (Li, Fang, & Ouyang, 2004).
Genetic algorithm (GA), presented by Holland (1975), is a parallel, global optimization method with the search strategy partly similar to DNA computing. It may be one of the possible ways to be adopted to break the barrier of DNA computing and to make it practical as the problem size scales up. However, the double helix structure of DNA molecular is not suitable to be combined with the chromosome of GA.
Recently, RNA computing has been developed based on DNA computing. Cukras, Faulhammer, Lipton, and Landweber (1999) developed the theory of RNA computing and proposed a destructive algorithm to solve the knight problem using only biological molecules and enzymes. Lipton suggested that DNA be replaced by RNA in DNA computing (Faulhammer et al., 2000), and Li and Xu (2003b) summarized all possible operations of RNA sequences, such as elongation operation, deletion operation, absent operation, insertion operation, translocation operation, transformation operation and permutation operation, etc. By introducing the complementary oligonucleotides of DNA molecules, RNA strands obtain DNA genetic information. The unique single chain structure and various operations of RNA strands make it easy to combine with SGA. Furthermore, the genealogical processes have been the subject of much research in recent years. Neuhauser and Krone (1997) introduced several models including DNA sequence models to study the genealogy of a random sample of genes, which are taken from a large haploid population that evolved according to random reproduction with selection and mutation. Enlightened by the DNA sequence model and its distribution rules, a digital RNA-GA is proposed and its convergence is analyzed. The algorithm used in this work is essentially an improvement of SGA. Both the crossover operator based on RNA operations and the mutation operator based on DNA sequence model are introduced to the proposed algorithm, which increase the genetic diversity in the population. Simulation studies on several test functions show the efficiency of the RNA-GA. Parameter estimation for process modeling is a very important step in the control, diagnosis and optimization of the process system. The parameter estimation for chemical process modeling is especially difficult because of its non-linear and complicated characteristics. In Song et al. (2003), there are totally 8 parameters to be estimated in a heavy oil thermal cracking 3-lumping model, the traditional parameter estimation method, such as least square method, cannot be used in the chemical processes because of its non-linearity. Similarly, the parameter estimation of a FCCU main fractionator (Zhong & Wang, 1998) with variable coupling is difficult for the traditional parameter estimation. In this paper, both cases are implemented successfully by RNA-GA. Thus, this work focuses on two aspects: (1) the development of the RNA-GA operators and the convergence analysis of RNA-GA and (2) its usage for test functions and parameter estimation of chemical processes.
Section snippets
Digital encoding of RNA sequence
The type space for a RNA sequence is E = {A, U, G, C}L, i.e., sequences of length L, where four nucleotide bases Adenine(A), Uracil(U), Guanine(G), Cytosine(C) are utilized to encode the solution of the given problem in RNA computing. However, such RNA sequence cannot be processed by digital computer. Since the binary digital coding (00, 01, 10, 11) can represent the characteristics of RNA nucleotide bases, such as structure, function group, complementary relationship and the number of hydrogen
Global convergence analysis of RNA-GA
As for the global optimization problem (1) and (2), Li et al. (2002) made a summary of conditions guaranteeing the convergence of GA with mutation operator, which is listed as follows. Assumption 1 At every generation t, if every individual (x) in the population and a random individual y satisfy x ≠ y, then there exists p(t) > 0, where p(t) is the probability of changing x into y by one mutation operator. Theorem 1 If GA with elitist strategy satisfies Assumption 1, it will converge in probability to the optimal solution
Test functions
In order to test and compare performances of the proposed optimization algorithms, a test environment must be provided in the form of several objective functions. Selecting a group of representative functions is not an easy task, since any particular combination of properties represented by a test function does not allow for generalized performance statements. Table 1 compiles a list of commonly used test functions, which represent a group of landscape classes with various characteristics:
Simulations on parameter estimation
Due to the superior performance of RNA-GA, such a hybrid strategy is applied for model parameter estimation in this section. The following modelis considered, where y(t) is the system output, u(t) the system input vector, and θ = [θ1, θ2, …, θk]T are the parameters to be estimated, and the form of model g is supposed to be known. The job is to estimate parameters θ = [θ1, θ2, …, θk]T according to certain index that is a function of the true system outputs and the model sample outputs
Conclusions
By combining RNA operations and DNA sequence model with genetic algorithm, a framework of RNA-GA is proposed for complex function optimization as well as model parameter estimation. Numerical simulation results demonstrate the effectiveness of the hybridization, especially the advantages of RNA-GA in terms of optimization quality, efficiency as well as initial conditions. The superiority of the proposed RNA-GA is the combination of DNA sequence model with variable mutation probability as well
Acknowledgements
This paper has been supported by the National Natural Science Foundation of China under grants 60421002 and 70471052. The authors would also like to thank associate professor J.M. Zhang, L. Xie for discussions about this work, and the anonymous reviewers for their helpful comments.
References (20)
- et al.
Chess games: A model for RNA based computation
Biosystems
(1999) - et al.
Solving traveling salesman problems with DNA molecules encoding numerical values
BioSystems
(2004) - et al.
A DNA solution of SAT problem by a modified sticker model
BioSystems
(2005) Molecular computation of solutions to combinatorial problems
Science
(1994)- Boneh, D., Dunworth, C., & Lipton, R. (1995). Breaking DES using a Molecular Computer. Technical Report CS-TR-489-95,...
- et al.
Solution of a 20-variable 3-SAT problem on a DNA computer
Science
(2002) - et al.
Molecular computation: RNA solutions to chess problems
Proceeding of the National Academy of Sciences of the United States of America
(2000) - et al.
Soft molecular computing
- et al.
Simulating biological reactions: A modular approach
Adaptation in natural and artificial systems
(1975)
Cited by (75)
A least square support vector machine approach based on bvRNA-GA for modeling photovoltaic systems
2022, Applied Soft ComputingLow-carbon cold chain logistics using ribonucleic acid-ant colony optimization algorithm
2019, Journal of Cleaner ProductionOptimization of critical parameters of PEM fuel cell using TLBO-DE based on Elman neural network
2019, Energy Conversion and Management