PREDICTING PROTEIN SECONDARY STRUCTURE BASED ON ENSEMBLE NEURAL NETWORK

ABSTRACT


I. INTRODUCTION
It Proteins are sophisticated molecular structures that universally execute the cell to cell routines that are essential to support life.Though the organic proteins have been very playing very remarkable roles, it is only a small percentage of the entire probable amino acid structures that become visible in nature.Bioinformatics which is a collaboration between biology and computer science provides an avenue that allows more comprehensive exploration of protein's sequence space to develop artificial proteins with enhanced robustness and greater usefulness in comparison with their natural equivalents.It is clear from [1,2] that several protein functions are facilitated by protein-protein interactions (PPIs).Therefore, restructuring their interfaces to improve or fine-tune the connecting attraction and connecting mode of PPIs is a useful technique to improve the functions of proteins [3].This method has been used productively to remodel different protein systems [4][5][6][7][8], and has an enormous capacity for the design and implementation of an innovative curative, globular protein and other beneficial proteins.
There are two basic computational techniques for predicting protein secondary structure: template-based approach and machine learning approach.The drawbacks of the template-based approach are that it has lower accuracy compared to a machine learning approach, and performs poorly on non-homologous proteins [9].Machine learning (ML) usually predict protein structure by first extracting essential features from protein sequences.The shortcoming of this approach is that the extracted features might not contain all the information that is in a protein sequence.This means that some important information might have been lost [10][11][12].
On the other hand, from the viewpoint of biology, protein sequence has very important information that enables it to take on specific structures [9,12,13].Even though it is very interesting to predict such structures, serious problems are arising from it.To start with, the complexity of the connection between a sequence and its subsequent structure is a huge disadvantage [14,15].Moreover, the chosen features can considerably influence the efficiency of the learner [12].Another issue is the problem of noise in the training data as well as protein sequences and their associated structures.Finally, there is the problem of class inequality in amino-acid samples as the classes are not evenly distributed [16].
Artificial Neural Networks (ANN) is a major recent advance in machine learning.It possesses the ability to automatically learn to appropriately represent primary data, detect attributes that are above average, enhance effectiveness over conventional models.ANN also improve the understanding of the significance of data and offers further understanding on the composition of biotic data [17].Motivated by the characteristics of ANN.
This paper proposed an ensemble Neural Network based approach for prediction of protein secondary structure.Our main contributions in this paper are as stated below: 

II. RELATED WORKS
The Several works have been done in the field of protein structure prediction.The ML based approach are one of the prevalent methods in use for the purpose.There are three popular sets of ML models that have found application for protein sequence-structure mapping.They include: artificial neural networks (ANN), deep learning techniques, and ensemble learners.ANN is the earliest type of ML algorithms that are applied for predicting protein secondary structure.In reality, using an ingenious ANN can accurately predict the boundaries of a class.Experimental results of various research have shown that recurrent neural networks (RNN) [18][19][20] are very effective for processing protein sequence data.A variant of RNN called bidirectional recurrent neural networks is known to exploit the data of the complete structure.Whereas time is a complex entity that plays a very important role in information retention [21][22][23].It has been demonstrated that long temporary RNN possesses the ability to keep data over a long time span [24].The Convolutional neural network (CNN) is a type of supervised Deep learning algorithm.The CNN is beneficial for predicting protein structure due to its dependable, robust, concurrent processing and self-learning proficiency.The CNN uses additional sequence information in it is learning process and makes allowances for mutuality of the neighbouring frame of reference [25].Recently, research in both surface [26,27] and DNN [28], aggregate the predictions of different networks in an ensemble manner.
An ensemble NN having three deep learning algorithms was developed by [28].Though neural networks have several advantages, they are characterised with several weaknesses such as difficulty in choosing optima values for parameters like neurons, layers, and activation functions.The implication of this is the negative impact it has on the prediction result.There is also the drawback of the ANN algorithm getting stuck in the local minima.
To overcome these challenges associated with the prediction of the protein secondary structures, we employed an Ensemble Neural Network model that comprises of the following Feed forward neural network, Cascade FFN and RNN.We compared the performance of our proposed model with that of Pattern recognition neural network, NARX and Multilayer Perceptron neural network.

III.1.1 Feed Forward Neural Network (FFNN)
The FFNN is a very popular neural networks.It was developed as a result of the need to develop more efficient artificial neural network that will overcome weaknesses associated with back propagation learning algorithm.The FFNN feeds data from the inputs layers via the hidden layers to the outputs.The basic reason why it is referred to as feed forward network is that it uses forward propagation.The major strength of the FFNN is their ease of implementation and management.This makes them suitable for approximating any type of input and output representation [29].The effectiveness of FFNN deeply hangs on the tuning of the weights of the nodes.The discrepancy between the result produced by the FFNN and the forecasted output is finalized after every iteration.As the neural network is separated with regard to their nodes, the training process becomes more controllable.The structure and operation of the FFN is in Figure 1.Source: [29].
The activation function equation of the  ℎ hidden neuron is given in Equation 1: Where ℎ  is the  ℎ hidden neuron, (  ) is the connection function that ensures that the outcome does not change in proportion to a change in any of the inputs among input and hidden layers,   is the weight in the  ℎ entry in a (K× ) weight matrix,   is the K input value.
Where   is the  ℎ output value.

III.1.2 Cascade Forward Network
The cascade forward network (CFN) neural networks have some similarities with the FFNN because they also use back propagation algorithm for updating weights.The main difference is that they are made up of a weighted connection to the input of individual level as well as across one level to the successive levels [30].It was opined that some cascade forward back propagation network can have superior performance compared to FFNN in many cases [31].One of the striking characteristics of this network is that individual layer of neurons is connected to the entire preceding layer of neurons [32].The pictorial depiction of CFN is in Figure 2. Source: [32].
The mathematical equation for CFN is stated as: The activation function from the input layer to the output layer is represented as   , while the weight from the input layer to the output layer is    .In a situation where bias is combined with the input layer, the activation function of each neuron in the hidden layer is represented as  ℎ so equation ( 3) can be expressed as: )  =1 (4)

III.1.3 Recurrent Neural Network (RNN)
RNN is distinctive for its extra set of responses from the generated result of layer that is concealed in between input and output layers.This layer constitutes the context layer that preserves data among observations [33].The result of processing in a preceding phase can be carried over and used in the current period phase.This important attribute of the RNN offers a tremendously significant advantage, especially in real-time applications.RNN can have an unrestricted memory level and can therefore learn connections through time in addition to learning via the all current possible inputs [34,35].The RNN is depicted in Figure 3. Source: [33].
Input to hidden layer is expressed as in Equation 5as: Here ℎ  is the hidden layer at the instance  ℎ , moreover,   is the function,  ℎ is the input to hidden layer of weight matrix, and   is the input at instance  ℎ .Also, ℎ −1 is the hidden layer at instance  − 1, and the bias or threshold value is represented by  ℎ .Equation 6 which represents the hidden to output layer is stated as: Where the output vector is represented as   , the hidden to output layer weight matrix is  ℎ , and   is the bias or threshold.

III.1.4 Multi-Layer Perceptron Neural Networks
Multi-layer perceptron (MLP) It is made up of many perceptrons.MLP comprises of an input layer that accepts data, an output layer which generates results or prediction as regards the input, and in the middle of those two, there exist a set of random hidden layers that serves as the real computational engine of the MLP.The MLPs with one hidden layer is capable of estimating any function that is continuous at every value in an interval.Equation 7expresses the rule used in updating the parameters (wn,bn): The condition for halting the algorithm is that a function that accepts a dataset as input and produces a decision as output that correctly categorizes the whole training datasets to various classes must be found [37].Figure 4   Source: [37].

III.1.5 Non-linear Autoregressive (NARX) Network with Exogenous
Non-linear autoregressive (NARX) neural network is regarded as a classic time series predictor [38][39][40].The idea behind NARX is a nonlinear overview of the Autoregressive Exogenous (ARX) that is categorized as a yardstick tool in linear system detection where the most important thing is fitting the data irrespective of the mathematical makeup of the model [41].One of the major strength of the NARX models is their ability to model a wide-ranging array of nonlinear dynamic systems.They have used for solving many time-series modelling problems [42].They are considered as a recurrent dynamic neural network with feedback connections that encircle many layers of the network [43].It is necessary to fully exploit the NARX neural network memory capacity using the previous values of forecasted or actual time series.This will make the NARX neural network to have its optimal performance.Figure 5 depicts the architecture of the NARX.Source: [39].

III.2 NEURAL NETWORK TRAINING ALGORITHMS
In this paper three training algorithms namely: LM, SCG, and RBP were used.Description of these algorithms is done in this section.The Training Algorithms is used for weighting adjustments of RNN, CFN and FFN model used in this research.

III.2.1 Levenberg Marquardt (LM)
LM is a highly effective technique for weights adjustment.It is the fusion of the gradient descendent rule and the Gauss-Newton technique.LM determines the step size using a parameter that accepts big values for the initial iterations (the same as that of the Gradient Descent algorithm), and small values in the later phases (like what is obtainable in Gauss-Newton technique).The LM is a fusion of the strengths of both techniques.This makes it start converging from any early-stage like the Gradient Descent method.It also has quick convergence close to the neighbourhood of the least error the same way the Gauss-Newton method behaves.The LM technique however overcomes the weaknesses that are exhibited by both the Gradient Descent algorithm and Gauss-Newton technique [44,45].The Levenberg-Marquardt algorithm with vector of unknown parameters which are decided during step  + 1 is represented by Equation 8: LM with error is denoted by Equation 9: In situations where the parameters of the vector are not the best ones, and the value of error ( 8) is not at the lowest level.In such circumstance: (  , )(  , ) ≪    (12) Can be taken and this results in the Gradient Descent technique which results in equation 15: If the value of coefficient   is small, it denotes that the values of the parameters of vector  are close to the best solution.At this instant: (  , )(  , ) ≫    (14) Indicated that the LM algorithm is condensed to the Gauss-Newton method: Computation of the Jacobian  regarding the weight and bias variables  is done using Backpropagation.The tuning of the variables is done according to Levenberg-Marquardt.

III.2.2 Resilient Back Propagation (RBP)
The RBP was first proposed by [46].It is a supervised learning method that learns from the entire training dataset at once in FFN.The primary goal of the RBP is to eradicate the negative outcome of the volume of the partial derivative on the weight step.The aftermath of this is that the sign of the derivative is the only factor taken into account to specify the path through which the weight will be updated [46].
When the weight update is done using back propagation, the weight update is decided through the partial derivative expressed in equation 16: Assuming  is the learning rate,   () depict the propagation of inputs backwards to the  ℎ neuron at time step , and δ is the equivalent error gradient.Unlike the RBP, Resilient propagation computes a distinctive delta ∆  for every connection.This plays a determining factor in choosing the magnitude of the weight update.The reader can consult Riedmiller [45] for a detailed explanation on the RBP algorithm.

III.2.3 Scaled Conjugate Gradient (SCG)
SCG algorithm was proposed by [47,48].SCG is a variant of conjugate gradient technique developed to lower the running time by employing LM algorithm to increase the step size of the line search for each learning iteration [44].Through the step size scaling system, this technique circumvents the timewasting and inefficient line-search that characterized each learning iteration.SCG can effectively handle wide-ranging problems.It has proven over time to be very effective at handling training Feed Forward Neural networks and other outsized networks.According to Martin [50], SCG algorithm was derived from quadratic reduction of objective function E within N iterations to the lowest possible level.

III.3 PROPOSED ENSEMBLE NEURAL NETWORK TECHNIQUE
We employed four (4) variants of ANN classifiers (FFNN, RNN, CFN and NARX) as base learners and Random Forest (RF) was used as the top layer of our proposed model.Due to the complexity of protein structure, we integrate all the optimal outputs produced by FFNN, RNN, CFN and NARX after they were trained using LM, RBP and SCG. Figure 6 shows the architecture of the proposed Ensemble ANN.Source: Authors, (2020).
In the proposed ensemble technique, the protein sequence is fed into each of the Neural Network algorithms in the learning stage.The output is regarded as the aggregated classification result of all the learning Neural Networks (FNN, RNN, CFN and NARX) which is used for the final prediction.

IV.1 EXPERIMENTAL SETUP
The dataset used for the experiments conducted in this work was obtained from the repository of Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB).The dataset is made up of several kinds of macromolecules of biological importance.A great part of the dataset is protein.Since the antecedent of DNA is RNA that can be converted, it therefore means that proteins are the biomolecules that are immediately interacting in biological routes and progressions.The repository contains over 400,000 annotated protein structures sequences which are publicly available at https://github.com/iamdebanjangoswami/Predictive-Proteinclassification--Naive-Bayes-Classifier.All the simulation for this work was done using MATLAB 2018 version.

IV.2 PERFORMANCE METRICS
We evaluate the prediction performance of the ensemble ANN using three metrics: Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Scaled Error (MASE).

IV.2.1 Mean Absolute Error (MAE)
MAE is defined as the average of the difference between predicted and actual values in the test.

IV.2.2 Root Mean Square Error (RMSE)
RMSE is defined as the standard deviation of prediction errors in a test.

IV.2.3 Mean Absolute Scaled Error (MASE)
MASE is defined as is a measure of the accuracy of predictions.It is a scale-free error metric that gives each error as a percentage in comparison to a standard mean error.

IV.3 RESULT AND DISCUSSION
This section describes the statistical results of our simulations.The ensemble ANN was employed to classify and predict the protein sequence structure in the dataset.The dataset was divided into a training set (60%) and test set (40%).The individual classifiers were used to train the dataset.The proposed ensemble ANN produced a classification accuracy of 99.48%.Table 1 shows the simulation results of our proposed model in comparison with some models.Source: Authors, (2020).
The comparison of accuracy for the different models considered in this paper is depicted in Figure 7 while the accuracy is presented in percentage in Figure 8.  Depicted in Table 2 is the confusion matrix of the various algorithms under consideration.Source: Authors, (2020).
From Table 2, it is easy to draw a comparison between the actual class and predicted results .Ensemble accurately predicts 140663 instances out of 141400 instances (2872 DNA instances that are truly DNA and 137791 protein instance that are protein).And 737 instances wrongly predicted (422 instances of DNA class predicted as protein and 315 instances of protein class predicted as DNA).This explains why Ensemble produced superior prediction accuracy compared to other Neural Network models under consideration.From our experiments, it is obvious that Ensemble have superior performance in term of effectiveness and efficiency considering its classification accuracy and MASE.

V. CONCLUSIONS
An Ensemble ANN model for predicting protein secondary structure is proposed in this paper.The proposed model integrated different Neural Network algorithms for an enhanced predictive accuracy.The three ANN used are FFNN, RNN and CFN.Our statistical results show clearly that our model produced superior results compared to other six models compared.It can therefore be deduced that it is better to predict protein secondary structure by means of the fusion of different ANN rather than using the models alone.In the future, we hope to perform experiment with deep learning architectures and compared it is performance with the ensemble algorithm propose in this work.Also we intend to Extend this research by using moth flame optimisation, particle swarm optimization, grey wolf optimisation, genetic algorithms and others.
We developed an ensemble Neural Network learning model that can process hidden contexts of input protein sequences and accurately predict their secondary structures. We used four (4) different neural network classifiers and jointly combined the classification results to represent the final results. Better classification accuracy of protein secondary structure was achieved through aggregation of ensemble results of Feed forward neural network (FFN), Cascade forward neural network, Recurrent neural network, neural network, and Non-linear autoregressive network with exogenous (NARX).
below depicts a diagrammatical representation of MLP Neural Networks.
2, … ;  represents the number of iteration loops;  × is the Jacobian matrix;  × is the unit matrix;   is the scalar and its value changes during iteration;  = [ 1 ,  2 , … ,   ] is the model parameters searched for.

Table 1 :
Performance evaluation of models based on MAE, RMSE, MASE and Percentage Accuracy.