Building a novel classifier based on teaching learning based optimization and radial basis function neural networks for non-imputed database with irrelevant features

Ch. Sanjeev Kumar Dash (Silicon Institute of Technology, Bhubaneswar, India)
Ajit Kumar Behera (Silicon Institute of Technology, Bhubaneswar, India)
Satchidananda Dehuri (Department of Information and Communication Technology, Fakir Mohan University, Balasore, India)
Sung-Bae Cho (Soft Computing Laboratory, Department of Computer Science, Yonsei University, Seoul, Republic of Korea)

Applied Computing and Informatics

ISSN: 2634-1964

Article publication date: 4 August 2020

Issue publication date: 1 March 2022

653

Abstract

This work presents a novel approach by considering teaching learning based optimization (TLBO) and radial basis function neural networks (RBFNs) for building a classifier for the databases with missing values and irrelevant features. The least square estimator and relief algorithm have been used for imputing the database and evaluating the relevance of features, respectively. The preprocessed dataset is used for developing a classifier based on TLBO trained RBFNs for generating a concise and meaningful description for each class that can be used to classify subsequent instances with no known class label. The method is evaluated extensively through a few bench-mark datasets obtained from UCI repository. The experimental results confirm that our approach can be a promising tool towards constructing a classifier from the databases with missing values and irrelevant attributes.

Keywords

Citation

Dash, C.S.K., Kumar Behera, A., Dehuri, S. and Cho, S.-B. (2022), "Building a novel classifier based on teaching learning based optimization and radial basis function neural networks for non-imputed database with irrelevant features", Applied Computing and Informatics, Vol. 18 No. 1/2, pp. 151-162. https://doi.org/10.1016/j.aci.2019.03.001

Publisher

:

Emerald Publishing Limited

Copyright © 2019, Ch. Sanjeev Kumar Dash, Ajit Kumar Behera, Satchidananda Dehuri and Sung-Bae Cho

License

Published in Applied Computing and Informatics. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) license. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this license may be seen at http://creativecommons.org/licences/by/4.0/legalcode


1. Introduction

The occurrence of missing values and irrelevant features in real data are not uncommon, whereas data mining algorithms are designed for quality data [1]. Hence, building a classifier for the dataset consist of missing values and many irrelevant attributes leads to non-useful results [2]. Therefore, to derive novel and useful results for the decision maker, the process of imputing and identifying missing values and relevant features, respectively are highly recommended. Since decades ago these two problems are treated as the problem of importance in object detection & recognition (pattern recognition)[11] and data mining [3] in general and ECG signals diagnosis [13], power flow calculation [14], simulation and control of dynamic system [15], magnetic modeling [16], identification and classification of plant leaf diseases [17], discrimination of low and full fat Yogurts [19,20,22] in specific.

There are several approaches to impute missing values of which we concentrate on least square estimation method [2,3]. A large variety of feature selection techniques have been developed under the umbrella of filter, wrapper, and embedded methods with a goal to select relevant subset of features [4]. In this work a filter style approach known as “Relief” method is used for selecting a subset of attributes that preserves the relevant information found in the entire set of attributes [5]. After the task of imputation of missing values and selection of the relevant set of features, we develop a classifier based on TLBO and RBFNs by inheriting their best features [6,7,8]. RBFN one of the members of artificial neural networks (ANNs) [21,22] has good generalization, simple structure and strong tolerance to noise which ignited us to consider here as a suitable method of classification. Many methods have been developed for training RBFNs [12,17,18], however, to the best of our knowledge, training RBFNs using TLBO is new. TLBO is a population based optimization algorithm motivated by a teacher on the output of learners within a classroom environment, where learners first obtain knowledge from teacher and subsequently from classmates. Moreover, a new improved TLBO (iTLBO) has been proposed to train the RBFNs.

In a nutshell, this work undergoes three different phases like imputation of missing values by least square estimation approach, feature selection through Relief, and classification by iTLBO trained RBFNs in pipeline.

2. Background

The background of this research work like missing values imputation, feature selection, RBFNs, and TLBO are discussed here.

2.1 Imputation of missing values and feature selection

The problem of classification is basically the foundation of dividing the feature space into sections, one section for each category of inputs. Classifiers are usually, designed with labeled data, which is sometime referred to as supervised classification. In general, classification with missing data and irrelevant features focuses on three distinct tasks: handling missing values [1] (i.e., imputing values), feature selection, and pattern classification. Let D = [xij]Nxd, where i = 1, 2,…, N, and j = 1,2,…,d, is the dataset containing N samples and d features. In D, each sample is assigned a class label from the set C = {c1, c2,..,cM}, where |C|= M. Let each xij, be represented as a tuple (xij, yij), in which yij can take only two values either 0 or 1. If the value of vij = 0, then its associated xij value is missing, otherwise present. Input data has quantitative and qualitative variables. Quantitative or continuous data is measured on a numerical scale. Non-numerical (i.e., colors, names, opinions) is called qualitative data, which can be discrete or categorical. The overall goal of handling missing value is to map the value of yij from 0 to 1 by substituting an appropriate value of xij with less bias.

Alongside feature selection problem is defined as to select a subset of features from the given set of features, thereby the dataset is mapped from (xij)Nxd to (xij)Nxk, where k ≪ d. With this intention, filter method is selecting the most relevant features, however, a predefined quality measure is necessary to establish the level of relevance of the features. Filter method is not able to identify correlation among the features simultaneously. Unlike filter, wrapper is able to address correlation among features because it uses the performance of the classifier to optimize the subset. This also led towards problem of intractability. Moreover, this method has the additional cost of reconstructing the classifier with modified feature subset. Hence to avoid these issues, a filter like algorithm known as Relief method is employed here.

2.2 Radial basis function networks

The RBF network [8] is a topology having three layers: an input, a hidden, and a linear output layer (see Figure 1). The input can be modeled as n-dimensional input vector. The hidden layer implements a radial activation function and that carry out a non-linear transformation from the input space to the hidden space. The center and width are two parameters associated for each hidden node. Usually, a nonlinear transformation from input to the hidden space is made based on Gaussian kernel as described in Eq. (1).

(1)ϕi(x)=exp(xμi22σi2),
where ||…|| represents Euclidean norm, μi, σi, and ϕi are center, spread, and the output of ith hidden unit, respectively. The interconnection between the hidden and output layer are made through a weighted connections wi. The output layer, a summation unit, supplies the response of the network to the outside world.

The radial basis function is so named because the value of the function is same for all points which are at the same distance from the center.

In literature, radial basis function networks [6] have many extensive uses, including classification, time series prediction, function approximation, etc. Training RBF networks is normally faster than training multi-layer perceptron (MLP) networks. Training of RBF network [9,11] involves two steps: (1) the kernel parameters of the hidden neurons are determined by an unsupervised method or heuristic method; (2) The weights of the output-layer are determined by pseudo-inverse method.

2.3 Teaching learning based optimization

Teaching learning based optimization is one of the population based nature inspired algorithms introduced by Rao et al. [6,9]. This is inspired purely from the natural phenomena of teaching-learning process that motivated by a teacher on the output of learners within a classroom environment, where learners first obtain knowledge from teacher and subsequently from classmates. In the first phase, a teacher imparts knowledge directly to his/her students. In practice, the possibility of a teacher’s teaching being successful, is distributed under Gaussian law. Overall, how much knowledge is transferred to a student depends not only on his/her teacher but also interactions among the students through peer learning. A basic algorithm of TLBO is presented below.

3. Proposed method

Our integrated approach is undergoing three phases in pipeline. In first phase, the missing values are imputed by least square estimator, in second phase the relief algorithm is used for feature selection and finally our improved TLBO based RBFN is used for building the classifiers for the preprocessed database. Figure 2 is illustrating our approach.

3.1 Missing value imputation using least-square estimator

In this phase, we estimate the missing value from D by formulating a matrix A, where all the attribute values are known. In the least-square problem, the output of a model is given by the linearly parameterized expression,

(2)y=φ1f1(u)+φ2f2(u)++φnfn(u).

If the target system has q outputs, expressed as y=[y1,yq]T with q > 1 then we have a set of linear equations in matrix form, AΘ + E = Y, where A is an mxn matrix as given below:

A=[f1(u1)fn(u1).f1(um)fn(um)]

and Θ is an n×q unknown parameter matrix:

Θ=[θ11θ1q.θn1θnq]Y=[y11y1q.ym1ymq]

is an m×q output matrix with yij denoting the jth output value in the ith data pairs.

(3)Θ=(ATA)1ATY

After getting the value of Θ, we will continue imputing all the values in the data set D.

3.2 Relief algorithm for feature selection

In this second phase of our work, we discuss Relief algorithm inspired by instance based learning. It is an filter method algorithm for individual feature selection. It calculates a proxy statistics for each feature that can be used to estimate the feature quality or relevance to the target concept. The pseudocode of this method is given below.

3.3 Improved TLBO based RBFN

In third phase, we are building a RBFN classifier which is trained by TLBO and improved TLBO. First we will provide a detailed introduction to the improved TLBO and then the improved TLBO + RBFN network is developed with the aim of achieving better classification accuracy.

3.3.1 Improved TLBO (iTLBO)

In the canonical TLBO, during the learning phase the learner is exposed to the entire population of the class. However, it has been realized that if the learner is restricted with a peer team instead of all individuals of the population then he/she can raise his/her level of acquiring knowledge. With this idea, we are introducing a neighborhood structure of learners as peer learners group for making a learner to learn. Hence, in the learner phase, we have adopted a square topology as peer learners group for a learner. That means a student will not only acquire knowledge from the best of all individuals (i.e., teacher) but also he/she improves his/her standard from his neighborhood of fellow learners. In that context, the learner phase of TLBO has been modified as given below.

Here the nearest_neighbor( ) will find out a group of peer learners for a learner. The size of the neighborhood can be treated as a parameter for learner phase. Alongside, we have also made the teaching parameter (TF) adaptive by considering the individual fitness value and population diversity. Recall that the teaching factor decides the value of mean to be changed. In the canonical TLBO, the value of TF is either 1 or 2 thereby learners learn nothing from the teacher or learn all the things from the leaner. But in real practice, the value of TF may be between 1 and 2 include both. Hence to make this idea fruitful, the fitness variable is selected as inputs to choose TF. BS is containing the global best solution denoted as Xkg found so far i.e. up to kth iteration, which is just a position for one individual, corresponding to the best fitness Fkg. So the global best solution fitness differentials between kth and k-1th can be defined as:

(4)ΔFk=|FSBkFSBk1|.

Now, we can give definition for function of convergence speed as follows:

(5)Cs=ΔFk/Δ,
where Δ1=max{ΔF1,ΔF2,ΔFk} Eq. (5) can calculate the convergence speed, which is less than or equal to 1.

In evolution process of TLBO, population diversity is a major factor. For computing the diversity of the population, standard deviation of the individual fitness values of population can be used. In this paper, we present a new strategy for calculating population position diversity by fitness value. The population position diversity can be obtained by using deviation ideology approach defined in Eqs. (6)–(8).

(6)Favgk=1/|P|i=1|P|Fk(i),
(7)Δ2=max{|Fk(1)Favgk|,,|Fk(|P|)Favgk}|,
(8)σ2=1/|P|i=1|P|(Fk(i)FavgkΔ2)2,
where Favgk is the average population fitness for current kth iteration; Fk(i) stands for ith individual fitness; Δ2 stands for normalization factor; |P| is the population size, σ2 represents the population diversity. It is evident that larger the σ2, the larger is the population diversity.

To improve adaptive teaching factor TF, we use the index C_S for representing the convergence speed with respect to the best solution fitness found so far in current iteration, and the index σ2 to represent diversity with regards to the population fitness deviation. Hence we can compute TF by C_S and σ2 adaptively, as follows:

(9)TF=α.CS+β.σ2+1,
where α,β are factors; σ2 and C_S are less than or equal to 1 and greater than 0, so 1TFα+β+1, Rao et al. [6] suggested that the value of TF can be either 1 or 2. Hence, we set α+β+12. The proposed method of adaptive teaching factor (TF) is applied for better local searching ability that improves the accuracy and convergence speed.

3.3.2 iTLBO + RBFN

This section describes the iTLBO + RBFN which can adjust the network parameters during the training process. In the initialization stage, let the position of the ith individual be represented as shown in Figure 3. RBFNs mainly depend on center and width of the kernel in addition to weights and bias. However, here, we just encode the centers, widths, and bias into an individual for stochastic search using iTLBO.

Suppose the maximum number of kernel nodes is set to Kmax, then the structure of the individual is represented as follows (c.f., Figure 3):

In other words, each individual has three constituent parts such as center, width, and bias. The length of the individual is 2Kmax + 1.

The fitness function which is used to guide the search process is defined in equation (10).

(10)f(x)=1Ni=1N(tiΦ^(xi))2
where, N is the total number of training instances, ti is the actual output and Φ^(xi) is the estimated output of RBFNs. Initially, the centers, widths, and bias are computed using training vectors, the weight is computed using pseudo-inverse method.
(11)Y=WΦW=(ΦTΦ)1ΦTY

4. Experimental study

In the experimental study, we start with a brief description of the datasets, their characteristics about missing information and parameters used for simulation. Then we display results obtained by two different methods like TLBO + RBFN and iTLBO + RBFN along with detail analysis.

4.1 Description of datasets and parameters

The datasets used in this work were obtained from the UCI machine learning repository [10]. Seven datasets have been chosen to validate the proposed method i.e., iTLBO + RBFN. The details about the seven datasets are given in Table 1. The algorithmic parameters like population size, number of iterations, etc are fixed based on empirical analysis as follows.

The size of the population is equal to 100, number of iterations fixed at 300, size of the neighborhood is restricted with 10% of population size, and the value of TF has been adapted as per suggestions given in sub-section 3.3.1 with α value from (0, 1) and β value from (0, 1). The parameters of multi-layer perceptron (MLP) along with training algorithms and Simple Logistic are defined as prescribed in [3].

4.2 Results and analysis

The average results of the experiment obtained from 10 fold cross validation of 30 independent runs are given in Tables 2–7.

From Table 2 it is found that for 7 different datasets iTLBO + RBFN gives better accuracy than TLBO + RBFN, MLP, and Simple Logistic. To support the above results of TLBO + RBFN, statistical analysis based on the measures derived through confusion matrix is presented in Tables 3 and 4.

From the Statistical analysis it can be observed that the calculated Kappa-values for TLBO + RBFN with feature selection are much better than TLBO + RBFN without feature selection.

5. Conclusions

An integrated approach of iTLBO and RBFN has been proposed for making a classifier to classify unseen data by carefully considering the issues like missing values and dimensionality reduction. The approach undergoes three different phases before drawing any conclusions. In first phase, preprocessing task like missing value imputation is carried out by least square estimator. In second phase by Relief the relevant attributes are selected. Finally in the third phase a classifier is built by integrating iTLBO and RBFN. Determining the optimum key parametric values of RBFN, iTLBO is adopted. After careful training, the model was tested and it was noticed that in all datasets, iTLBO + RBFN is performing better than TLBO + RBFN in the case of complete dataset. Our bag of future research includes applications in big data and more parametric analysis of iTLBO in correspondence with the natural teaching-learning process.

Figures

Architecture of RBFN.

Figure 1

Architecture of RBFN.

Pictorial Representation of our Approach (LSEI: Least Square Estimation for Imputation).

Figure 2

Pictorial Representation of our Approach (LSEI: Least Square Estimation for Imputation).

Structure of the Individual.

Figure 3

Structure of the Individual.

Description of Datasets.

Dataset# Instances# Attributes#Classes
Hepatitis155192
Housevotes435162
Mammographic96162
Horse Colic368272
Wisconsin69992
Diabetes76882
Post-operative9082

Classification Accuracy of iTLBO + RBFN Using Least Square Imputation without Feature Selection.

DatasetTLBO + RBFNMLPSimple LogisticiTLBO + RBFN
Hepatitis88.461588.461585.897492.3077
House-votes96.78994.495494.036798.1651
Mammographic81.496976.507382.536483.9917
Horse Colic81.34580.434882.065282.0652
Wisconsin95.142994.571492.857196.2857
Diabetes71.093870.572970.833374.2188
Post-operative66.666764.444464.444475.5556

Further Details Analysis of TLBO + RBFN (Table 2).

DatasetTP RateFP RatePrecisionRecallF-MeasureKappa Statistics
Hepatitis88.51988.388.588.20.7194
House-votes96.8296.996.896.680.9318
Mammographic81.51981.681.581.40.6272
Horse Colic82.62382.582.682.30.6151
Wisconsin95.1495.295.195.20.9015
Diabetes71.137.670.471.10.70.3525
Post-operative66.766.744.466.753.30

Further Detail Analysis of iTLBO + RBFN (Table 2).

DatasetTP RateFP RatePrecisionRecallF-MeasureKappa
Statistics
Hepatitis91.065.389.491.089.90.684
House-votes98.61.798.698.698.60.9714
Mammographic83.816.883.983.883.70.6728
Horse Colic82.120.282.182.182.10.6163
Wisconsin97.73.797.797.797.70.939
Diabetes74.533.774.174.573.60.4292
Post-operative75.675.657.175.665.00

Classification Accuracy of iTLBO + RBFN Using Least Square Imputation with Feature Selection.

DatasetNo of Feature RemovedTLBO + RBFNMLPSimple LogisticiTLBO + RBFN
Hepatitis689.461589.743682.051393.5897
House-votes598.623996.78993.087699.0826
Mammographic278.170581.496981.354683.7838
Horse Colic785.869680.434881.056287.5
Wisconsin395.142993.710492.857199.1429
Diabetes273.437569.712970.729181.7708
Post-operative266.666763.414462.543475.5556

Further Detail Analysis of TLBO + RBFN (Table 5).

DatasetTP RateFP RatePrecisionRecallF-MeasureKappa Statistics
Hepatitis89.778.386.789.787.80.1545
House-votes98.61398.698.698.60.9705
Mammographic78.327.377.978.377.90.5218
Horse Colic85.91885.885.985.70.6917
Wisconsin95.1495.295.195.290.15
Diabetes73.434.572.873.472.70.405
Post-operative66.756.762.866.761.70.1176

Further Detail Analysis of iTLBO + RBFN (Table 5).

DatasetTP RateFP RatePrecisionRecallF-MeasureKappa Statistics
Hepatitis92.315.092.592.592.10.81
Housevotes99.11.099.199.199.10.98
Mammographic83.816.083.983.883.80.67
Horse Colic87.517.187.587.587.30.72
Wisconsin99.10.00399.299.199.10.98
Diabetes81.825.881.581.881.60.57
Post-operative75.675.657.175.665.00.00

References

[1]B. Twala, M.C. Jones, D.J. Hand, Good methods for coping with missing data in decision trees, Pattern Recogn. Lett. 29 (7) (2008) 950956.

[2]T.H. , B. Dysvik, I. Jonassen, LSimpute: accurate estimation of missing values in microarray data with least squares, Methods. Nucleic Acids Res. 32 (3), 2004.

[3]C.S.K. Dash, A. Saran, P. Sahoo, S. Dehuri, S.B. Cho, Design of self-adaptive and equilibrium differential evolution optimized radial basis function neural network classifier for imputed database, Pattern Recogn. Lett. 80 (2016) 7683.

[4]K. Kira, L.A. Rendell, The feature selection problem: traditional methods and a new algorithm, AAAI 2 (1992) 129134.

[5]R.J. Urbanowicz, M. Meeker, W. Lacava, R.S. Olson, J.H. Moore, Relief-based feature selection: introduction and review, J. Biomed. Inf., 2018.

[6]R.V. Rao, V.J. Savsani, D.P. Vakharia, Teaching–learning-based optimization: a novel method for constrained mechanical design optimization problems, Comput. Aided Des. 43 (3) (2011) 303315.

[7]R.V. Rao, V.J. Savsani, D.P. Vakharia, Teaching–learning-based optimization: an optimization method for continuous non-linear large scale problems, Inf. Sci. 183 (1) (2012) 115.

[8]C.S.K. Dash, A.K. Behera, S. Dehuri, S.B. Cho, Radial basis function neural networks: a topical state-of-the-art survey, Open Comput. Sci. 6 (2016) 3363.

[9]R.V. Rao, V.J. Savsani, J. Balic, Teaching–learning-based optimization algorithm for unconstrained and constrained real-parameter, Optim. Prob. Eng. Optim. 44 (12) (2012) 14471462.

[10]A. Frank, A. Asuncion, UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, CA, 2010 http://archive.ics.uci.edu/ml.

[11]M. Woźniak, D. Połap, Object detection and recognition via clustered features, Neurocomputing 320 (2018) 7684.

[12]N. Jankowski, Prototype-based kernels for extreme learning machines and radial basis function networks, International Conference on Artificial Intelligence and Soft Computing, 2018, pp. 7075.

[13]F. Beritelli, G. Capizzi, G.L. Sciuto, C. Napoli, M. Woźniak, A novel training method to preserve generalization of RBPNN classifiers applied to ECG signals diagnosis, Neural Networks 108 (2018) 331338.

[14]H.R. Baghaee, M. Mirsalim, G.B. Gharehpetan, H.A. Talebi, Nonlinear load sharing and voltage compensation of microgrids based on harmonic power-flow calculations using radial basis function neural networks, IEEE Syst. J. 12 (3) (2018) 27492759.

[15]M. Woźniak, D. Połap, Hybrid neuro-heuristic methodology for simulation and control of dynamic systems over time interval, Neural Networks 93 (2017) 4556.

[16]L. Ortombina, F. Tinazzi, M. Zigliotto, Magnetic modeling of synchronous reluctance and internal permanent magnet motors using radial basis function networks, IEEE Trans. Ind. Electron. 65 (2) (2018) 11401148.

[17]S.S. Chouhan, A. Kaul, U.P. Singh, S. Jain, Bacterial foraging optimization based radial basis function neural network (BRBFNN) for identification and classification of plant leaf diseases: an automatic approach towards plant pathology, IEEE Access 6 (2018) 88528863.

[18]H. de Leon-Delgado, R.J. Praga-Alejo, D.S. Gonzalez-Gonzalez, M. Cantú-Sifuentes, Multivariate statistical inference in a radial basis function neural network, Expert Syst. Appl. 93 (2018) 313321.

[19]D. Granato, P. Putnik, D.B. Kovačević, J.S. Santos, V. Calado, R.S. Rocha, A. Pomerantsev, Trends in chemometrics: food authentication, microbiology, and effects of processing, Compr. Rev. Food Sci. Food Saf. 17 (3) (2018) 663677.

[20]A.G.D. Cruz, R.S. Cadena, M.B.V.B. Alvaro, A.D.S. Sant'Ana, C.A.F.D. Oliveira, J.D.A.F. Faria, M.M.C. Ferreira, Assessing the use of different chemometric techniques to discriminate low-fat and full-fat yogurts, LWT-Food Sci. Technol. 50 (1) (2013) 210214.

[21]J.A. Matera, A.G. Cruz, R.S.L. Raices, M.C. Silva, L.C. Nogueira, S.L. Quitério, C.C. Júnior, Discrimination of Brazilian artisanal and inspected pork sausages: application of unsupervised, linear and non-linear supervised chemometric methods, Food Res. Int. 64 (2014) 380386.

[22]A.G. Da Cruz, E.H.M. Walter, R.S. Cadena, J.A.F. Faria, H.M.A. Bolini, A.F. Fileti, Monitoring the authenticity of low-fat yogurts by an artificial neural network, J. Dairy Sci. 92 (10) (2009) 47974804.

Acknowledgements

Publishers note: The publisher wishes to inform readers that the article “Building a novel classifier based on teaching learning based optimization and radial basis function neural networks for non-imputed database with irrelevant features” was originally published by the previous publisher of Applied Computing and Informatics and the pagination of this article has been subsequently changed. There has been no change to the content of the article. This change was necessary for the journal to transition from the previous publisher to the new one. The publisher sincerely apologises for any inconvenience caused. To access and cite this article, please use Kumar Dash, Ch. S., Kumar Behera, A., Dehuri, S., Cho, S. B. (2022), “Building a novel classifier based on teaching learning based optimization and radial basis function neural networks for non-imputed database with irrelevant features”, Applied Computing and Informatics. Vol. 18 No. 1/2, pp. 151-162. The original publication date for this paper was 18/03/2019.

Corresponding author

Satchidananda Dehuri can be contacted at: satchi.lapa@gmail.com

Related articles