User-friendly optimization approach of fed-batch fermentation conditions for the production of iturin A using arti ﬁ cial neural networks and support vector machine

Background: In the ﬁ eld of microbial fermentation technology, how to optimize the fermentation conditions is of great crucial for practical applications. Here, we use arti ﬁ cial neural networks (ANNs) and support vector machine (SVM) to offer a series of effective optimization methods for the production of iturin A. The concentration levels of asparagine (Asn), glutamic acid (Glu) and proline (Pro) (mg/L) were set as independent variables, while the iturin A titer (U/mL) was set as dependent variable. General regression neural network (GRNN), multilayer feed-forward neural networks (MLFNs) and the SVM were developed. Comparisons were made among different ANNs and the SVM. Results: TheGRNNhasthelowestRMSerror(457.88)andtheshortesttrainingtime(1s),withasteady ﬂ uctuation duringrepeatedexperiments,whereastheMLFNshavecomparativelyhigherRMSerrorsandlongertrainingtimes, which have a signi ﬁ cant ﬂ uctuation with the change of nodes. In terms of the SVM, it also has a relatively low RMS error (466.13), with a short training time (1 s). Conclusion: According to the modeling results, the GRNN is considered as the most suitable ANN model for the design of the fed-batch fermentation conditions for the production of iturin A because of its high robustness and precision, and the SVM is also considered as a very suitable alternative model. Under the tolerance of 30%, the prediction accuracies of the GRNN and SVM are both 100% respectively in repeated experiments.


Introduction
Produced by Bacillus subtilis, the nonribosomal lipopeptide antifungal antibiotic iturin A is structurally composed of two parts.The first part consists of seven amino acid residues (L-Asn-D-Tyr-D-Asn-L-Gln-L-Pro-D-Asn-L-Ser) which are formed into a peptide circle.The second part is a hydrophobic tail with 11-12 carbons [1,2,3].In terms of treating both human and animal mycoses, iturin A has been showed to be a potential bio-resource due to its wide-scale-spectrum antifungal activity [4,5].According to recent research, the iturin A can also be applied as a controlling agent to fight against plant pathogens causing a decrease in crop production, such as southern corn leaf blight [6].
During the past decades, researchers have paid much attention to the practical production of iturin A due to its foreseeable potential in biological fields.In order to increase the yield of iturin A, the optimization method is commonly adopted in creating better fermentation conditions.For decades, the optimization of fermentation has been studied in many ways [7,8].In a laboratory environment, the majority of the methods to optimize the fermentation process are largely based on data obtained from a large amount of experimental works, which cannot be used in practical applications.Additionally, the statistic-based methods such as the orthogonal experiment method and response surface methodology (RSM) [9] cost more manpower and resources than expected.In order to gain statistics that are suitable for practical production, researchers brought up the uniform design (UD) method.So far, the UD method has been successfully applied in many optimization processes [6,10,11].Compared with the traditional statistical methods, the UD can enormously save manpower and resources in the lab by reducing the number of essential experiments in different dimensions and allows as many different levels of factors as it can [6].
With the development of artificial intelligence (AI), artificial neural networks (ANNs) have been widely applied in predictive modeling.With a comparatively higher accuracy in modeling and better ability in generalization, ANNs are able to simulate the bio-process and predict the results [12,13,14,15].Compared with the traditional statistical methods, ANNs can also model all non-linear multivariate functions, while the traditional statistical methods can only model the quadratic functions [16,17,18].Also, it is reported that the ANNs are more accurate than the RSMs in many cases [19,20].Normally, UDs have relatively-representative and regularly-distributed patterns.Based on these patterns with high quality, ANNs are also able to establish equally accurate models with a comparatively smaller amount of data than it is supposed to require obtain.
Despite the advantages of ANNs modeling, few studies have reported using ANNs to reduce the number of experiments.An ANN model was established based on UD data was conducted by Peng and colleagues [6].In their research, the ANN model based on UD data was adopted in the optimization of iturin A yield and a comparison of the ANN-GA methods and the UD methods was conducted for the first time.Adopted widely during variable chemical process [21,22], this method can be effectively used for applications.However, as a technician, one may find it difficult to use this method to practical applications because of its complexity.People may feel confused using related approaches.Here, an alternative series of user-friendly ANNs and a support vector machine (SVM) are proposed to seek a better optimization method in order to increase the yield of iturin A based on the data from Peng's research [6].We aim at creating more alternative methods to improve the simplification of the fed-batch fermentation conditions for the production of iturin A, so that the maneuverability of the practical applications can be improved using novel modeling methods.

Fed-batch fermentation of iturin A
According to Peng and colleagues' research [6], the separated B. subtilis ZK8 strain was used for the production of iturin A. The seed culture-medium contained 2.86 g/L KH 2 PO 4 , 3 g/L MgSO 4 , 25 g/L glucose and 30 g/L peptone.The slant culture-medium contained 1.5 g/L K 2 HPO 4 , 1.8 g/L agar, 1.8 g/L MgSO 4 •7H 2 O, 20 g/L peptone and 10 mL/L glycerol.The fermentation culture-medium was prepared with 0.79 g/L KH 2 PO 4 , 0.8 g/L yeast extract, 2.4 g/L soybean protein powder hydrolysate, 3.8 g/L MgSO 4 and 31 g/L glucose.Strain ZK8 was activated in the slant culture-medium.The activated strain was then inoculated and incubated in the seed culture medium in a shaker at 30°C with 150 rpm for 20 h.Then, the seed culture was inoculated in fermentation culture by 10% amount of inoculum for 48 h at 30°C with 150 rpm.After 24 h of fermentation, the asparagine (Asn), glutamic acid (Glu) and proline (Pro) were added to the broth in different concentrations [6].The yield of iturin A was determined by titer measurement and the cylinder-plate method was used to measure the titer of iturin A [6,23,24,25].According to the experimental results [6], statistical results were obtained (Table 1).

ANNs
ANNs [26,27,28] are powerful machine learning techniques with the functions of estimation and approximation based on the inputs.Interconnected artificial neural networks usually consist of neurons that can calculate values from inputs and adapt to different situations.Therefore, ANNs are capable of numeric prediction and pattern recognizing.Recent years, ANNs have gained wide popularity in inferring a function from observation especially when the data or the task is too complicated to be dealt with human brains.In our studies, multilayer feed-forward neural networks (MLFNs) and general regression neural network (GRNN) were used for developing alternative models for optimizing the fed-batch fermentation conditions of iturin A.

MLFNs
MLFNs trained with a back-propagation learning algorithm, are the most popular neural networks [29,30,31].They are applied to a wide variety of chemistry related problems [29].
An MLFN model consists of neurons that are ordered into layers (Fig. 1).The first layer is called the input layer, the last layer is called the output layer, and the layers between are hidden layers.For the formal description of the neurons we can use the so-called mapping function Г, that assigns for each neuron i a subset Г(i) ⊆ V which consists of all ancestors of the given neuron.A subset Г(i) -1 ⊆ V consists of all predecessors of the given neuron i.Each neuron in a particular layer is connected with all neurons in the next layer.The connection between the ith and jth neuron is characterized by the weight coefficient ω ij , and the ith neuron by the threshold coefficient ϑ i (Fig. 2).The weight coefficient reflects the importance degree of the given connection in the neural network.The output value of the ith neuron x i is determined by [Equation 1and Equation 2].It holds that: where ζ i is the potential of the ith neuron, and function f(ζ i ) is the so-called transfer function (the summation in [Equation 2] is carried out over all neurons j transferring the signal to the ith neuron).The threshold coefficient can be understood as a weight coefficient of the connection with formally added neuron j, where x j = 1 (so-called bias).
For the transfer function, it holds that The supervised adaptation process varies the threshold coefficient ϑ i and weight coefficient ω ij to minimize the sum of the squared Table 1 Statistical experimental results of the amino acid concentration (mg/L) and iturin A titer (U/mL) (data extracted from Peng's research [6]).

Statistical item
Factor (mg/L) Iturin A titer (U/mL) differences between the computed and required output values.This is accomplished by minimization of the objective function E: where x 0 and x0 are the vectors composed of the computed and required activities of the output neurons and summation runs over all output neurons.
Input layer keeps corresponding input automatically and transfers input vector x to pattern layer.Pattern layer consists of neurons for training datums.In this layer, the weighted squared Euclidean distance can be calculated by [Equation 5].Test inputs applied to the network are first subtracted from values of pattern layer neurons.And either squares or absolute values of subtracts applied to exponential activation function will be summed.Results are transferred to the summation layer.Dot product of pattern layer outputs and weights is added by neurons of summation layer.In Fig. 3

SVM model
SVM is a learning algorithm mainly based on statistical learning theory [40].On the basis of the limited information of samples between the complexity and learning ability of models, this theory has an excellent capability of global optimization to improve generalization.In regard to linear separable binary classification, finding the optimal hyperplane, a plane that separates all samples with the maximum margin, is an essential principle of SVM [41,42].Not only does the plane help improve the predictive ability of the model, but also it helps reduce the error which occurs occasionally in classifying.Fig. 4 illustrates the optimal hyperplane, with "+" indicating the samples of type 1 and "-" representing the samples of type -1.
Fig. 5 shows the main structure of SVM.The letter "K" stands for kernels [43].As we can see from the figure, it is a small subset   extracted from the training data by relevant algorithm that consists of the SVM.For classification, choosing suitable kernels and appropriate parameters is of great importance to get prediction accuracy.However, a mature international standard currently for us to choose these parameters is nonexistent.In most circumstances, the comparison of experimental results, the experiences from copious calculating, and the use of cross validation that is available in software package are helping us to solve that problem to some extent [44,45].

Model development
According to previous research, the production of iturin A yields by adding various concentrations of Asn, Glu and Pro during the fed-batch fermentation process [6].Here, we aim at using novel ANNs and SVM to fit the concentration levels of the added component of Asn, Glu and Pro, from which we can use for the prediction of the iturin A titer.
The concentration levels of Asn, Glu and Pro (mg/L) were set as independent variables, while the iturin A titer (U/mL) was set as dependent variable.Since numeric predictions of machine learning techniques are completely based on existing data, the data should be divided into two sets before model developments, the training and testing sets.Training set help programs "learn" the regulation of data while testing set is used for validating the trained model after a training process.Here, 65% data group was set as training set, while 35% data group was set as testing set.The ANN prediction models were constructed by the NeuralTools ® software (trial version, Palisade Corporation, NY, USA) [46,47,48].We chose the GRNN and MLFN as the training algorithms.The SVM was developed with Matlab software.
We used root mean square (RMS) error and training time as the indicators to measure the performances of the ANNs and SVM (Table 2).The number of nodes of MLFNs were set from 2 to 25, from which we tried to find out the change regulation of the MLFNs when dealing with the development process.
Table 2 indicates that the GRNN, SVM and MLFNs with 3 and 6 nodes have comparatively low mean RMS errors (477.88,460.13, 526.38 and 583.13 respectively).It is clear that the GRNN and SVM have the lowest RMS errors and the shortest training times, while the MLFNs have comparatively higher RMS errors and longer training times.To determine the accuracy of predictions, the forecast accuracy was used as an indicator.In current applications, the empirical tolerance of ANNs is 30%, which means that a single prediction result can be considered as "good prediction" when the relative error is lower than 30% of the actual value.Here, the forecast accuracy is the percentage of the tested sample of "good prediction" in the total testing set.Table 2 shows that the forecast accuracy (under the tolerance of 30%) of the GRNN and SVM are both 100%.Here, we discuss the availability of the GRNN, SVM and MLFNs respectively in order to determine the most suitable model for the design of the fed-batch fermentation conditions for the production of iturin A.

Comparison between the GRNN and MLFNs
As for the GRNN, it has the lowest RMS error and the shortest training time during our research, compared with other 24 MLFNs.And according to the robustness of the principles of the GRNN [32,38], it has a high reproducibility, which has an overwhelming advantage compared to other ANNs during our research.In order to test the robustness of the GRNN, computational experiments for the GRNN were repeated, which are shown in Fig. 6.
Fig. 6 shows the RMS errors of the GRNN in repeated experiments.It is significant that there is a stable fluctuation during the experiments, which shows that the GRNN for the optimization process is robust.More importantly, the mean RMS error is relatively low, which ensures the availability of the GRNN.Under the tolerance of 30%, the prediction accuracy of the GRNN is 100% in all repeated experiments.For practical applications, one should use related software to find out the most suitable model for the optimization of the fed-batch fermentation condition in the range of low number of nodes.Compared to the GRNN, MLFNs cost longer time and the fluctuations are not as stable as what the GRNN presents.Therefore, we still consider that the GRNN is a more suitable model for the optimization of the fed-batch fermentation conditions.

Training and testing results of the GRNN and SVM
Here, we use one of the typical examples of the training and testing results to present the availability of the GRNN and SVM respectively.Fig. 8 and Fig. 9 are used to illustrate the training and testing results of the GRNN, while Fig. 10 is used to illustrate the testing results of the SVM.The training and testing sets of the GRNN and SVM are the same.
The capacity for recall of the GRNN for the optimization of the design is illustrated in Fig. 8, showing the training results of the GRNN.It shows that the GRNN has a strong capacity for recall.The predicted values are highly close to the actual values (Fig. 8a), which indicates that the non-linear fitting effects of the model is highly decent.The comparisons between the residual values and actual/predicted values (Fig. 8b and Fig. 8c) also show that the residual values are relatively low, which suggests the robustness of the development of the GRNN.
To show the availability of the GRNN after a training process, we use the data set which has not been used for the training process.Results are shown in Fig. 9.
Fig. 9 shows the precise predicted results during the testing process.Predicted values are close to the actual values (Fig. 9a).Residual values presented by Fig. 9b and Fig. 9c show that the residual values are relatively low.Results present the robustness and availability of the GRNN model when testing.
In terms of the testing results of the SVM, Fig. 10 illustrates the correctness and robustness of the SVM in the prediction section.
Being similar to the results of the GRNN in the aspects of the RMS error and the training time, the testing results of the SVM are also highly similar to those of the GRNN.We can see that the SVM can generate a fairly analogical and precise result, compared to the testing results of the GRNN.
In sum, the GRNN and SVM are both available for the optimization of fed-batch fermentation conditions for the production of iturin A. Both the GRNN and SVM have the lowest RMS errors and the shortest training times.Compared to Peng's research [6], the GRNN and SVM are more convenient because of the user-friendly packed software [46,47,48,49].Technicians can use the models and approaches provided by this article in practical applications without complex operation works.

Comparisons with other optimization methodologies
Related different optimization methodologies for biotechnology were presented in previous reports [6,9,19,20], including regression analysis, orthogonal experiment method and the RSM.Though these models have their own advantages (e.g. they do not have high requirements to computers), they also have many disadvantages compared to machine learning techniques like ANNs and SVM.Generally, the overwhelming advantages of ANNs and SVM in optimization process of biotechnological production are precision, robustness and time-saving.ANNs and SVM make predictions strongly based on the "well-trained" training data set and their programs can run automatically without too much human intervention.The non-linear function of ANNs can even develop a powerful non-linear prediction system, which ensures the precision of predictions [48,49].The principle of the SVM can strongly ensure robust results [40].With the development of computers and programming tools, ANNs and SVM now can be easily established, which are more time-saving and user-friendly.

Conclusion
According to the modeling results, the GRNN is considered as the most suitable ANN model due to its highly robustness and precision.
The SVM is also considered as a suitable alternative model due to its robust and precise testing results.Under the tolerance of 30%, the prediction accuracies of the GRNN and SVM are both 100% in repeated experiments.Results indicate that the GRNN and SVM are strong alternative and operable models for the optimization for the fermentation conditions of iturin A. Being compared to the MLFNs and other models provided by previous studies, the GRNN and SVM have overwhelming advantages including low RMS error, time-saving and user-friendliness.According to the characteristic of machine learning models, over-fitting can be avoided with a large scale of training data because it can get rid of the local over-fitting phenomenon [49,50].Therefore, with a larger scale of samples, the prediction results may be improved.We can rationally assume that in further practical applications, a larger amount of data obtained from mass production in industry can ensure higher availability and robustness of a model for optimizing the fermentation conditions of iturin A.

Financial support
, weights are shown by A and B, y values of training data stored at pattern layer determine their values, and f(x)K denotes weighted outputs of pattern layer where K is a Parzen window associated constant.Yf(x)K denotes multiplication of pattern layer outputs and training data output Y.At output layer, f(x)K divides Yf(x)K to estimate the desired Y, given in [Equation 6 and Equation 7] [32,38]:

Fig. 4 .
Fig. 4. Support vectors determine the position of the optimal hyperplane.

Fig. 7 .Fig. 8 .
Fig. 7. RMS errors and training times of MLFNs with the change of nodes.

Fig. 9 .
Fig. 9. Testing results of the GRNN.a) Predicted values versus actual values; b) residual values versus actual values; c) residual values versus predicted values.
This work was funded by the National Marine Public Welfare Research Project (No. 201305002 and No. 201305043), National Natural Science Foundation of China (No. 30901107), and the Project of Marine Ecological Restoration Technology Research to the Penglai 19-3 Oil Spill Accident (No. 19-3YJ09).

Fig. 10 .
Fig. 10.Testing results of the SVM.a) Predicted values versus actual values; b) residual values versus actual values; c) residual values versus predicted values.

Table 2
Best model search in different machine learning models.