DNA computing based RNA genetic algorithm with applications in parameter estimation of chemical engineering processes

https://doi.org/10.1016/j.compchemeng.2007.01.012Get rights and content

Abstract

Based on RNA genetic operations and DNA sequence model under selection and mutation, an electronic RNA genetic algorithm (RNA-GA) with improved crossover and mutation operator is proposed. The proposed algorithm can be implemented on real biochemical reaction after simple transition, thus, the brute force method of DNA computing can be broken. The convergence analysis of the proposed algorithm shows that RNA-GA with elitist strategy can converge in probability to the global optimum. Comparisons of RNA-GA with standard genetic algorithm (SGA) for typical test functions show the advantages and efficiency of the proposed algorithm. As illustrations, the RNA-GA is implemented on parameter estimation of a heavy oil thermal cracking 3-lumping model and a fluid catalytic cracking unit (FCCU) main fractionator. In both cases, it is shown that the methodology is effective in parameter estimation of chemical processes.

Introduction

Since Adleman (1994) first introduced DNA computing by solving a computationally hard problem of the directed Hamiltonian path problem, many groups have worked on different NP hard problems with fewer variables, such as maximal clique problem (Ouyang et al., 1997), 3-SAT problems (Braich et al., 2002), DES deciphering problem (Boneh, Dunworth, & Lipton, 1995), traveling salesman problems (Lee et al., 2004), etc. Conventionally, Adleman-style DNA computing consists of three major steps: (1) generate a data pool of DNA molecules that represent all possible solutions to the studied problem, (2) utilize a series of biology laboratory techniques to exclude the DNA strands that do not match the logic constraints of the problem, (3) collect the surviving DNA molecules for the answer readout process. According to the above steps, DNA computing requires that the size of initial data pool increase exponentially with the number of variables in calculation, so this kind of DNA computing method is a brute force method. Genuinely, the difficulty is not the absence of correct strands after computing, but the presence of vast contaminating DNA. In order to break the barrier of this brute-force method and implement the DNA operations with an existed digital computer, various improved DNA computing methods and electronic DNA computing algorithms have been studied. Yang and Yang (2005) modified a well-known sticker model to build solution sequences in parts satisfying one clause in a step, and eventually solved the whole Boolean formula after a number of steps. Yamamura et al. (2002) proposed a local search method based on DNA concentration computing to solve the shortest path problem. Because laboratory experiments in DNA computing are highly difficult, inefficient, un-scalable and expensive compared to conventional computing standard, most of improved DNA computing methods are carried out theoretically. Hence, Garzon et al. (1999) described an electronic DNA (EDNA) to simulate a virtual test tube with digital computer and reproduced Adleman's experiment. Hartemink et al. (1999) simulated biological reactions of DNA computing and implemented a simulator called CYBERCYCLER. Ouyang and co-workers proposed a genetic DNA computing algorithm to solve the maximal clique problem, which was possible to get a solution from a very small initial data pool and avoided enumerating all candidate solutions (Li, Fang, & Ouyang, 2004).

Genetic algorithm (GA), presented by Holland (1975), is a parallel, global optimization method with the search strategy partly similar to DNA computing. It may be one of the possible ways to be adopted to break the barrier of DNA computing and to make it practical as the problem size scales up. However, the double helix structure of DNA molecular is not suitable to be combined with the chromosome of GA.

Recently, RNA computing has been developed based on DNA computing. Cukras, Faulhammer, Lipton, and Landweber (1999) developed the theory of RNA computing and proposed a destructive algorithm to solve the knight problem using only biological molecules and enzymes. Lipton suggested that DNA be replaced by RNA in DNA computing (Faulhammer et al., 2000), and Li and Xu (2003b) summarized all possible operations of RNA sequences, such as elongation operation, deletion operation, absent operation, insertion operation, translocation operation, transformation operation and permutation operation, etc. By introducing the complementary oligonucleotides of DNA molecules, RNA strands obtain DNA genetic information. The unique single chain structure and various operations of RNA strands make it easy to combine with SGA. Furthermore, the genealogical processes have been the subject of much research in recent years. Neuhauser and Krone (1997) introduced several models including DNA sequence models to study the genealogy of a random sample of genes, which are taken from a large haploid population that evolved according to random reproduction with selection and mutation. Enlightened by the DNA sequence model and its distribution rules, a digital RNA-GA is proposed and its convergence is analyzed. The algorithm used in this work is essentially an improvement of SGA. Both the crossover operator based on RNA operations and the mutation operator based on DNA sequence model are introduced to the proposed algorithm, which increase the genetic diversity in the population. Simulation studies on several test functions show the efficiency of the RNA-GA. Parameter estimation for process modeling is a very important step in the control, diagnosis and optimization of the process system. The parameter estimation for chemical process modeling is especially difficult because of its non-linear and complicated characteristics. In Song et al. (2003), there are totally 8 parameters to be estimated in a heavy oil thermal cracking 3-lumping model, the traditional parameter estimation method, such as least square method, cannot be used in the chemical processes because of its non-linearity. Similarly, the parameter estimation of a FCCU main fractionator (Zhong & Wang, 1998) with variable coupling is difficult for the traditional parameter estimation. In this paper, both cases are implemented successfully by RNA-GA. Thus, this work focuses on two aspects: (1) the development of the RNA-GA operators and the convergence analysis of RNA-GA and (2) its usage for test functions and parameter estimation of chemical processes.

Section snippets

Digital encoding of RNA sequence

The type space for a RNA sequence is E = {A, U, G, C}L, i.e., sequences of length L, where four nucleotide bases Adenine(A), Uracil(U), Guanine(G), Cytosine(C) are utilized to encode the solution of the given problem in RNA computing. However, such RNA sequence cannot be processed by digital computer. Since the binary digital coding (00, 01, 10, 11) can represent the characteristics of RNA nucleotide bases, such as structure, function group, complementary relationship and the number of hydrogen

Global convergence analysis of RNA-GA

As for the global optimization problem (1) and (2), Li et al. (2002) made a summary of conditions guaranteeing the convergence of GA with mutation operator, which is listed as follows.

Assumption 1

At every generation t, if every individual (x) in the population and a random individual y satisfy x  y, then there exists p(t) > 0, where p(t) is the probability of changing x into y by one mutation operator.

Theorem 1

If GA with elitist strategy satisfies Assumption 1, it will converge in probability to the optimal solution

Test functions

In order to test and compare performances of the proposed optimization algorithms, a test environment must be provided in the form of several objective functions. Selecting a group of representative functions is not an easy task, since any particular combination of properties represented by a test function does not allow for generalized performance statements. Table 1 compiles a list of commonly used test functions, which represent a group of landscape classes with various characteristics:

Simulations on parameter estimation

Due to the superior performance of RNA-GA, such a hybrid strategy is applied for model parameter estimation in this section. The following modely(t)=g(u(t),θ)is considered, where y(t) is the system output, u(t) the system input vector, and θ = [θ1, θ2, …, θk]T are the parameters to be estimated, and the form of model g is supposed to be known. The job is to estimate parameters θ = [θ1, θ2, …, θk]T according to certain index that is a function of the true system outputs and the model sample outputs

Conclusions

By combining RNA operations and DNA sequence model with genetic algorithm, a framework of RNA-GA is proposed for complex function optimization as well as model parameter estimation. Numerical simulation results demonstrate the effectiveness of the hybridization, especially the advantages of RNA-GA in terms of optimization quality, efficiency as well as initial conditions. The superiority of the proposed RNA-GA is the combination of DNA sequence model with variable mutation probability as well

Acknowledgements

This paper has been supported by the National Natural Science Foundation of China under grants 60421002 and 70471052. The authors would also like to thank associate professor J.M. Zhang, L. Xie for discussions about this work, and the anonymous reviewers for their helpful comments.

References (20)

There are more references available in the full text version of this article.

Cited by (75)

View all citing articles on Scopus
View full text