Prediction of the Yield of Enzymatic Synthesis of Betulinic Acid Ester Using Artificial Neural Networks and Support Vector Machine

3\b{eta}-O-phthalic ester of betulinic acid is of great importance in anticancer studies. However, the optimization of its reaction conditions requires a large number of experimental works. To simplify the number of times of optimization in experimental works, here, we use artificial neural network (ANN) and support vector machine (SVM) models for the prediction of yields of 3\b{eta}-O-phthalic ester of betulinic acid synthesized by betulinic acid and phthalic anhydride using lipase as biocatalyst. General regression neural network (GRNN), multilayer feed-forward neural network (MLFN) and the SVM models were trained based on experimental data. Four indicators were set as independent variables, including time (h), temperature (C), amount of enzyme (mg) and molar ratio, while the yield of the 3\b{eta}-O-phthalic ester of betulinic acid was set as the dependent variable. Results show that the GRNN and SVM models have the best prediction results during the testing process, with comparatively low RMS errors (4.01 and 4.23respectively) and short training times (both 1s). The prediction accuracy of the GRNN and SVM are both 100% in testing process, under the tolerance of 30%.


Introduction
3β-O-phthalic ester of betulinic acid has clinical potential as an anticancer medicine, which can be synthesized from reaction of betulinic acid and phthalic anhydride using lipase as biocatalyst (Fig. 1). It has a variety of properties including inhibition of antibacterial, anti-inflammatory, anti-malarial, anthelmintic, antioxidant and human immunodeficiency virus (HIV) (Yogeeswari, 2005).According to previous studies, the introduction of polar groups at the C-3 and C-28 positions also highly increases the anticancer activity and hydro-solubility (Thibeault et al., 2007;Gauthier and Legault, 2008).However, the practical applications of betulinic acid in the pharmaceutical and 3 medical industry is deeply constrained because it is insoluble in water (approximately 0.02 mg/mL) under ordinary circumstances, leading to great difficulties in preparation of injectable formulations for biological experiments and decreases the bioavailability.
The detailed approaches for the synthesis of 3β-O-phthalic ester of betulinic acid based on chemical catalytic esterification have been reported by previous research reports (Mukherjee et al., 2004;Kvasnica et al., 2005;Mukherjee et al., 2006;Rajendran et al., 2008), which have several disadvantages (e.g. high energy consumption and by-products) (Yasin et al., 2008). Compared with traditional chemical approaches, the application of enzymes in organic synthesis offers a series of advantages, including high catalytic efficiency, high selectivity, mild reaction condition and high product purity and quality (Loughlin, 2000;Zarevúcka and Wimmer, 2008).
However, the best detailed conditions for the synthesis are difficult to obtain due to the large-scale and complex laboratory experiments. Moghaddam and colleagues(2010) used artificial neural network (ANN) models to develop models for predicting the yield of enzymatic synthesis of betulinic acid ester. They successfully found that the quick propagation algorithm was the best model during their computational experiments.
Nevertheless, previous ANN models are based on comparatively complex operations and the selection method of the best ANN model was based on a limited number of results, which are not robust and user-friendly enough, compared with latest machine learning models. Here, we aim at using novel and user-friendly approaches of ANN models and support vector machine (SVM) to train the data of the yield of enzymatic 4 synthesis of betulinic acid ester, and obtain a series of best machine learning models for the prediction of the yield. Comparisons are made in order to determine the most suitable machine learning model for the prediction.

Data Set
According to previous research, the synthetic conditions of enzymatic synthesis of betulinic acid ester includes time (h), temperature (º C), amount of enzyme (mg) and molar ratio(mmol betulinic acid/mmol phthalic anhydride) (Moghaddam et al., 2010).
Here, we aim at using novel ANN and SVM models to fit the four conditions and to predict the isolated yield (%) of the enzymatic synthesis.
A complete machine learning model consists of two parts, the independent variable(s) and the dependent variable(s). Here, we set the time (h), temperature (º C), amount of enzyme (mg) and molar ratio as independent variables, while the isolated yield (%)was set as the dependent variable. 65% data group was set as training set, which 35% data group was set as testing set.

ANN models
A series of statistical learning algorithms with the name of Artificial Neural Networks (ANN) could give judgments when inputted into a large amount of information 5 (Hopfield, 1988;Judith and DeLeo, 2001;Yegnanarayana, 2009). Like the brain, a biological neural network, ANNs are usually comprised of neurons that can make instant calculations in different conditions with the connection with each other.
Different from ordinary networks with one or two layers, there are three layers in Artificial Neural Networks which can learn inputs efficiently and recognize patterns in a direct way. Furthermore, ANNs can also use complicated algorithms to make prediction and find the optimum solution. Therefore, when dealing with problems that are too complex to solve, ANNs can take the place of human brains, and the application of ANNs is more and more popular in the scientific research. In this passage, we introduce the use of two kinds of ANNs, multilayer feed-forward neural networks (MLFN) and general regression neural networks (GRNN) to build the models to forecast the yield of enzymatic synthesis of betulinic acid ester.

Multilayer feed-forward neural networks (MLFN)
With the training of a back-propagation learning arithmetic, multilayer feed-forward neural networks, one of the most popular neural networks, can be used to predict a large range of chemical reactions. (Johansson et al., 1991;Smits et al., 1994;Svozil and Kvasnickab, 1997) Neurons in the MLFN models are put into different layers (Figure2). Input layer is the first layer and output layer is the last one. Between them, hidden layers play a role of 6 calculating and modeling. To be specific, we could use the mapping functionГ that allocates a subset Г( ) ⊆ to each neuron i to describe the neurons in a formal way, and The subset Г( ) is made up of all ancestors of the neuron. Meanwhile, there is a subset Г( ) −1 ⊆ containing all ancestors of the given neuron . All neurons in a given layer is connected with any one of the neurons in the past layer. A weight coefficient can be used to present the connection of the th and jth neuron, and we apply the threshold coefficient ϑ i (Fig. 3) to present the th neuron. The level of significance of a particular connection in the neural network can be indicated by the weight coefficient.
Additionally, Eqs. (1) and (2) can determine the output value (activity) of the th neuron . It holds that: Where the potential of the i th neuron is represented by and function ( )indicates the transfer function (the summation in Eq. (2) runs over all neurons devolving the signal to the th neuron). The threshold coefficient could be comprehend as a weight coefficient of the connection with regularly added neuron , where = 1(so-called bias).
For the transfer function it holds that: To minimize the total of the squared differences between the required and calculated output values, The weight coefficients ω ij and threshold coefficients ϑ i are devolved by the supervised adaptation process. Minimization of the objective functionE can complete this by: Where 0 and x 0 are vectors comprised of the required and calculated processes of the output neurons and the summation carried out over all output neurons .
Compared with other statistical neural networks like feed-forward networks, GRNN represents more accurate in regard to function approximation (Kandirmaz et al., 2014).
Even if it is the first time to be used with the aim of function approximation, in some 8 studies, GRNN is also applied to classification problems with small modifications (Kandirmaz et al., 2014). As can be seen from Figure 4, there are 4 layers in GRNN, respectively, input layer, pattern layer, summation layer, and output layer, with the characteristics of rapid learning, coherence, and finding optimum with a great many specimens .
In the input layer, corresponding inputs are conserved and every input vector can be devolved to pattern layer which contains different neurons for training data. As  6) and (7) (Goulermas et al., 2007;Yang and Li, 2014): 2.3 SVM model There is a formidable machine learning technique with the name of support vector machine (SVM) established from the statistical learning theory (Deng et al., 2012). In terms of increasing generalization, this theory can give an integral optimization in an efficient way, with restricted information of specimens between the learning capacity and the complicacy of models. Separating all specimens with the maximum margin, the main theory of SVM is a plane which has the ability of discovering the optimal hyperplane and linear separable dualistic classification (Zhong et al., 2013;Chen et al., 2015). Additionally, the plane can also increase the forecasting ability of the model and can decrease the mistake occurring accidentally when classifying. As can be seen from Figure 5, specimens of type 1 are represented by "+" and specimens of type −1 are represented by "−" to shows the optimal hyperplane.
To explain the main structure of a representative support vector machine, Figure 6 shows a small subset derived from the training data by related algorithm that contains the SVM. Kernels are characterized by the letter "K" (Kim et al., 2005). To have a forecasting accuracy, appropriate kernels and suitable parameters should be selected in terms of classification. However, we could not find an available international standard 10 to select these parameters. In most cases, to solve this task in a relatively reasonable way, we could take the advantage of the experiences from massive calculations, the contrast of experiment results, and the application of cross validation which is realizable in program package (Fan et al., 2008;Guoand Liu, 2010;Chen et al., 2015).

Model Development
The ANN prediction models were constructed by the NeuralTools ® software (trial version, Palisade Corporation, NY, USA) (Pollar and Jaroensutasinee, 2007;Friesen et al., 2011;Vouk et al., 2011). The GRNN and MLFN was chosen as the learning machines of ANNs.
We used RMS error and training time as the indicators to measure the performances of ANN and SVM models ( Table 1). The nodes of MLFN models were set from 2 to 25, from which we could find out the change regulation of the MLFN models when dealing with development processes.  For showing the capacity for recall of the GRNN model for the optimization of the design, Figure 9 is used for illustrating the training results of the GRNN. Figure 9 shows that the GRNN model has a strong capacity for recall. The predicted values is highly close to the actual values ( Figure 9. (a)), which indicates that the non-linear fitting effects of the model is highly decent. The comparisons between the residual values and actual/predicted values (Figure 9. (b) and (c)) also show that the residual values are relatively low, which suggests the robustness of the development of the GRNN model.
For showing the availability of the GRNN model after a training process, we use the data set which has not been used for the training process. Results are shown in Figure   10. In terms of the testing results of the SVM, Figure 11 is illustrated for showing the correctness and robustness of the SVM in the prediction section.
14 Being similar to the results of the GRNN in the aspects of the RMS error and the training time, the testing results of the SVM are also highly similar to those of the GRNN. We can see that the SVM can generate a fairly analogical and precise result, compared with the testing results of the GRNN.
To make a comparison between the GRNN and SVM, we should firstly note that the initial values for the training process of the GRNN are random, leading to different results in repeated experiments. Compared with the GRNN, the SVM has very good repeatable results due to its principle. Therefore, it seems that the GRNN is not as robust as the SVM. However, results of repeated experiments (Figure 7) show that regardless of those fluctuations of RMS errors, the GRNN is also highly robust because the fluctuations are in a controllable range, which ensures the robustness of the GRNN.
In terms of the training time, the GRNN and SVM are too short to find out the differences. However, we should note that the GRNN model can be mainly developed by packed software, while the SVM needs to use the Matlab and finish a series of processes, which requires a higher requirements of computer configuration and comparatively longer time. For a more convenient operation, the GRNN seems more practical than the SVM. Nevertheless, due to the high robustness and repeatability, the SVM should not be neglected in practical applications.
Here, enzymatic synthesis of betulinic acid ester is a typical example for the application proved that support vector machine is a novel and strong machine learning tool for related research and applications.

Conflict of Interests
The authors declare that they have no conflict of interests.