A Comparison between Multi-Layer Perceptron and Radial Basis Function Networks in Detecting Humans Based on Object Shape

Human detection represents a main problem of interest when using video based monitoring. In this paper, artificial neural networks, namely multilayer perceptron (MLP) and radial basis function (RBF) are used to detect humans among different objects in a sequence of frames (images) using classification approach. The classification used is based on the shape of the object instead of depending on the contents of the frame. Initially, background subtraction is depended to extract objects of interest from the frame, then statistical and geometric information are obtained from vertical and horizontal projections of the objects that are detected to stand for the shape of the object. Next to this step, two types of neural networks are used to classify the extracted objects. Tests have been performed on a sequence of frames, and the simulation results by MATLAB showed that the RBF neural network gave a better performance compared with the MLP neural network where the RBF model gave a mean squared error (MSE) equals to 2.36811e-18 against MSE equals to 2.6937e-11 achieved by the MLP model. The more important thing observed is that the RBF approach required less time to classify the detected object as human compared to the MLP, where the RBF took approximately 86.2% lesser time to give the decision.


Introduction
The problem of detecting humans in images has been extensively considered in literature as they are needed in so many applications in a lot of fields.A main such application is monitoring (surveillance) visually.Monitoring systems can be found in many places like houses, streets, work places, and shops, and this use has been mainly for the purpose of getting records to be used for security purposes or cases alike, and did not involve detecting presence of human objects in images, which is in reality a complex issue by itself and when compared to detecting other types of objects.This is so as humans can assume a variety of postures and are articulated in their shape, so it is impossible to use just one model capable of covering the whole of such possible cases.The first issue a visual monitoring system has to deal with to identify an object is to abstract objects from a given image that are candidates to be compared with a targeted object [1].
To detect an object requires isolating objects of concern in the video frames by clustering pixels of the frame into background and object.Such a thing can be achieved with different methods like "background subtraction", "frame difference", and "optical flow" [1].The moving part abstracted from the frame (object) could represent unalike moving objects like humans, cars, clouds in sky, birds, and trees that sway.And one way to detect humans among these objects is through classifying these objects.The ways used for object classifications are either based on its shaping, or its movements, or its coloring, or its structure [2].Classification of the objects is an easy task for humans, but it's challenging to the machine.Object classification includes two stages, candidate objects detection, and pattern recognition which in turn includes two stages which are feature extraction and object classification [3].
Depending on the features extracted, the candidate objects are then classified into prespecified categories using appropriate methods through comparing the candidate objects pattern with objects patterns in a reference database.
ANNs have been used and managed to solve such kind of problems that were normally solved by statistical ways.Among such problems they have been used for are, For example, speech recognition, recognition of currents of sonar beneath water, guessing the subaltern construction of round proteins, and classification problems [9].ANNs are efficient in handling noisy data [10].
There are different types of ANNs.One of the types is the feed-forward ANNs (like the single layer feed-forward nets, multi-layer feed-forward nets (MLP), and that called radial basis function (RBF).Another type is the recurrent (feedback) ANNs like the Hopfield networks, Elman networks, and Jordan networks [11].
In this work, the MLP and the RBF are chosen for classification purpose that serves the target of human detection in a sequence of images.Also, an evaluation and a relative comparison will be made among the two methods considering their efficiency in accomplishing the classification.The main reason for choosing MLP is that, with sufficient data, sufficient inner (hidden) units, and suitable time to train, an MLP of one inner (hidden) layer is capable to learn approximation, essentially speaking, of any formula to any level of exactness.That is why, regarding approximation, MLPs are said to be characterized of universality.This means they could be used in case of having little of ahead information that relates the targeted object to inputs.It is true that one inner (hidden) layer is enough in case of having sufficient data, but there are cases in which a net having two or more inner layers might need lesser inner units and weights compared to a net of one inner layer.This leads to the conclusion that additional inner layers occasionally can serve the purpose of generality.The RBF is chosen due to its better approximation capability, simpler network structure, and faster learning algorithms.RBF networks have been widely used in many science and engineering applications.The RBF and MLP are the utmost employed kinds of feed-forward neural nets.There is a difference between the MLP and the RBF in how the inner units process the data streaming from the inputs.Where, the RBF depends on the Euclidean distance, and the MLP uses inner products.Also, the MLP separates the classes by using hidden units which form hyperplanes in the input space, as indicated in Fig. 1a, while RBF separates the classes by local kernel functions, as indicated in Fig. 1b [12].Regarding training, most of the methods used for training MLP can also be applied to RBF networks [13].

Candidate Objects Detection
As stated in section 1, the first stage in objects classification is candidate objects detection.To detect candidate objects means isolating objects of concern in video frames.This purpose can be achieved with different approaches like "background subtraction", "frame difference", and "optical flow" [1].Background subtraction method depends the difference between the current image and the background image to detect objects added to the scene [14].The method formulas are given in Eq. ( 1) and (2): Where   is the current frame,  is the background image, and T is the threshold value.

Pattern Recognition
The second stage in candidate objects classification is pattern recognition.The issue of recognizing patterns can be subdivided into two problems, extracting features, and classifying.Next to detection of objects of concern, it is important to abstract some features for recognizing and modeling their shapes in automatic way.Such thing can be achieved through finding a collection of coefficients that provides a substantial description for the information being delivered [1].Regarding classification, the objects are classified based on the extracted features into different categories by using suitable methods that compare the objects of interest with objects inside a reference database, [13].The approaches used up to now for classifying the objects are either "shaping" based, or "movement" based, or "coloring" based, or "structure" based [2].The abstracted attributes or features are submitted to the classifying net [1] To get some features about the object, some steps need to be done.Firstly, the system produces a different form for the binary image of the targeted object.Then, based on this newly produced form, the most important statistic and geometric properties are extracted.This new representation is composed of two projections, vertical and horizontal.Where, the vertical projection is given by the sum of the white points in the rows of the binary image, and the horizontal projection is given by the sum of the white points in the columns of the binary image and as given by the following formulas [1].

Artificial neural networks
In recent decades, neural computing has emerged as a practical technology for the purposes of classification, function approximation, data processing, filtering, clustering, compression, decision making, etc., with successful applications in many fields as diverse as medicine, finance, geology, engineering, biology and physics [9 &11].
An artificial neural network can be defined as a set of simple computational units that are highly interconnected.The units are loosely representing the biological neurons and are also called nodes.Figure (2) gives a depiction for the neuron [1].A neuron can be defined as an information processing unit that is fundamental to the operation of a neural network.The connections between nodes are unidirectional.Such connection system resembles those "synaptic connections" of a brain.For each individual connection there is a weight, (   ), called "synaptic weight" and stands for how the connection is strong between units j and k.This weight may take positive or negative value which makes it dissimilar to brain's synapse.For a given node (neuron), the outputs of the nodes connected to it are summed, after being multiplied by specified weights, to form its input.Then, this input will be modified by what is known as the "activation function" of the node.The "activation function" is also known as "squashing function".In reality this function modifies the input value within limits.The model of such a node (neuron), which is shown in figure (2), contains a "bias", symbolled by bk, that can lower (if it is positive) or increase (if it is negative) the net input of the "activation function" [1].
And the RBF neural network formula is [15]: The types used in this work, as indicated and justified in the introduction, are the MLP and the RBF.