Dendrite Ellipsoidal Neuron Trained by Stochastic Gradient Descent for Motor Imagery Classification

Arce, Fernando; Mendoza-Montoya, Omar; Zamora, Erik; Antelis, Javier M.; Sossa, Humberto; Cantillo-Negrete, Jessica; Carino-Escobar, Ruben I.; Hernández, Luis G.; Falcón, Luis Eduardo

doi:10.1007/978-3-030-21077-9_8

Dendrite Ellipsoidal Neuron Trained by Stochastic Gradient Descent for Motor Imagery Classification

Fernando Arce¹⁸,
Omar Mendoza-Montoya¹⁹,
Erik Zamora¹⁸,
Javier M. Antelis¹⁹,
Humberto Sossa^18,19,
Jessica Cantillo-Negrete²⁰,
Ruben I. Carino-Escobar²⁰,
Luis G. Hernández¹⁹ &
…
Luis Eduardo Falcón¹⁹

Conference paper
First Online: 18 May 2019

1472 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11524))

Abstract

Dendrite ellipsoidal neurons are a novel and different alternative for classification tasks, giving competitive results compared with typical classification methods. Based on k-means++ algorithm, the network allows each dendrite to build a hyperellipsoidal in order to assign each incoming pattern $x_{i}=(x_{1},x_{2},\ldots ,x_{n})^{T}$ to its respective C class. The main disadvantage of this training algorithm is the lack of accuracy in high dimensional datasets. In this research, we solved this problem by training the dendrite ellipsoidal neuron using stochastic gradient descent. Furthermore, electroencephalography data were acquired during two mental conditions (imaginary movements of the left and right hand) in order to test the new training algorithm. The proposed algorithm outperformed the accuracy acquired by a dendrite ellipsoidal neuron based on k-means++ obtaining 76.02% and 62.77%, respectively. Also, the algorithm was compared with multilayer perceptrons and support vector machines which are some of the most common classifiers used to detect motor-related information in brain signals. These achieved an accuracy of 72.38% and 65.81%, respectively.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

The main objective about pattern classification is to stablish a mathematical function that associate input patterns $x_{i}=(x_{1},x_{2},\ldots ,x_{n})^{T}$ to their corresponding classes $C^{1},C^{2},\ldots ,C^{j}$. This assignation must be as strong as possible to reduce potential variations in incoming data and must be capable to find elemental relationships between patterns.

Since 60 years ago, many types of neural networks have been presented for solving classification tasks and the most common approach is a group of classical perceptrons which create hyperplanes to divide and associate data by using synaptic weights, biases and activation functions.

In a common neural network, each neuron can divide the input search space into two parts. Thereby, appending more neurons in a single layer, the network has the capacity to learn any complex function [1].

Other type of neural networks less known are Dendrite Morphological Neural Networks (DMNN) which separate data employing hyperboxes. These neurons group patterns using minimum or maximum operators to generate the piecewise boundaries for classification tasks. DMNN have the advantage of being easily implemented in logic devices.

This research proposes an improvement to a specific type of morphological neural networks called Dendrite Ellipsoidal Neuron (DEN) trained with k-means++ algorithm [2, 3]. DEN has shown good performance in low dimensional datasets, requiring few training parameters and it is easy to implement in logic devices. Although DEN has shown to be efficient, it also has poor performance with high dimensional datasets. All these DEN advantages motivated us to explore new ideas in order to improve DEN accuracy in high dimensional datasets.

In this paper, we trained a DEN using Stochastic Gradient Descent (SGD) [4] implemented as a neural network layer of the Keras library [5] in Python. Furthermore, in order to test the proposed training algorithm in a high dimensional dataset, Electroencephalography (EEG) data were adquired from eight able-bodied subjects for classifying Motor Imagery (MI) of the hands into binary classes (Left vs Right). Contributions of this research are:

This is the first time that a DEN is trained by SGD.
Through a series of experiments, we show that the new training algorithm outperforms the actual DEN accuracy for our dataset, and the accuracy achieved by some of the most common classifiers for MI.

The rest of the paper is structured as follows: Sect. 2 provides a chronological list of publications related to previous literature with a comprehensive explanation. Section 3 describes the methods and materials used to obtain and characterize the EEG signals. Section 4 shows the DEN and the proposal architecture. Section 5 describes the general details of the used classifiers and the experimental results. In Sect. 6, we give our conclusions and future work.

2 Related Works

Morphological Neural Networks MNN were originally besought by Ritter and Davidson as a combination between neural networks and image algebra [6,7,8,9]. After this, Arbib published a book where it was taken into account that biological neurons process the information not only in the neuron cells but also in the dendrites [10]. Other related works resumed this research [11]. And several approaches have been proposed based on heuristics to manipulate hyperboxes not taking into account the dendrite fitness function.

All these techniques create hyperboxes to divide the input space into rectangular segments. In addition, in 2017 we presented DEN which changes the operations performed for MNN dendrites by using the Mahalanobis distance [12]. The main advantage of the ellipsoidal model is that it creates smoother decision boundaries and not rectangular regions.

Some other similar approaches to the k-means++ [13] clustering algorithm and the Mahalanobis distance [12] are elliptical k-means clustering algorithms, Gaussian Mixture Models (GMM), and classifiers based on the Mahalanobis distance.

Authors in [14, 15] employed elliptical k-means clustering algorithm to discriminate between human and nonhuman faces. For this, they altered k-means by modifying the normalized Mahalanobis distance to achieve six face pattern clusters.

GMM is a probabilistic technique focused to approximate almost any continuous density by using an enough number of Gaussian [16].

3 Methods and Materials

This section describes the experiments carried out to obtain EEG signals from subjects whom performed MI of both hands and the preprocessing and feature extraction procedure.

3.1 Experiment Setup

For this study, eight healthy people (three males and five females) aged 25 to 30 participated in an experiment designed to obtain EEG recordings for two mental tasks:

1.
Imagined movements of the left hand and,
2.
Imagined movements of the right hand.

These experimental conditions consisted of flexion and extension of the fingers of the right and left hand mentally without performing the actual movements. A graphical user interface developed by our team provided the instructions of the experiment and indicated when the subject had to carry out the mental imaginations.

The experiment was divided into 16 blocks of a duration of 28 s each. A block started with a fixation cross shown on the screen for 5 s, followed by a visual cue of the action to be performed by the participant (3 s). White arrows and a sphere that moved from the center of the screen to the left or right side of the monitor represented the different types of tasks, Fig. 1 (Right). Then, the participant had to execute for 15 s the imagined movement specified by the interface. Finally, the word “Rest” was shown on the screen for 5 s, indicating that the subject could relax or move freely until the beginning of the next block. The software selected the task of each block randomly. Also, both conditions were balanced, i.e., the subject performed eight times the “left” task and eight times the “right” task. In total, an experiment lasted around seven and a half minutes. Fig. 1 (Left) illustrates the different stages of this paradigm.

During the experiment execution, a g.USBamp amplifier recorded EEG signals from 12 active electrodes at a sampling rate of 256 Hz (g.tec medical engineering GmbH, Austria). Data were band-pass filtered from 0.1 to 100 Hz, and a built-in notch filter removed the power supply noise. According to the international 10/20 system, the electrode positions used in this experiment were FC3, FCz, FC4, C3, Cz, C4, CP3, CPz, CP4, P3, Pz, and P4. This arrangement was selected to cover scalp locations that are close to the motor cortex. Additionally, the ground electrode was located at AFz, and the reference electrode was placed over the right earlobe.

3.2 Preprocessing and Feature Extraction

The Common Spatial-Pattern (CSP) algorithm was used to characterize the brain activity of both experimental conditions. This algorithm finds linear combinations of the original EEG signals (or band-limited components of the EEG) so that the variances of the new signals of one condition are maximized, whereas the variances of the signals of the other condition are minimized. In this way, if the log-variances of the signals in the projected space are used as features, the separability between conditions is optimal. In this study, the CSP algorithm was applied over band-limited components extracted by a filter bank. This strategy is commonly known as Filter Bank Common Spatial-Pattern (FBCSP).

In the preprocessing stage, a filter bank of gaussian bandpass filters with a bandwidth of 4 Hz extracted 22 components from the EEG signals (4, 5, 6, ..., 25 Hz). Then, the data was separated into epochs or trials of 1 second of time samples. Trials contaminated by visual or muscular artifacts were identified and rejected from this study. Finally, the CSP algorithm was used to compute a new set of signals for each frequency component to increase the separability between conditions. For each band, the three best spatial filters that maximize the variances of the “Left” conditions were calculated. Likewise, the three best spatial filters that maximize the variances of the “Right” conditions were also computed, associated to a class label y {Left, Right}. The features used in the classification stage were the log-variances of these time-series. In total, each trial consisted of 132 new projected signals, $x\in \mathfrak {R}^{132\times 256}$.

4 Den Architecture

DEN has the same structure as any other neural network architecture: an input, a hidden and an output layer, Fig. 2.

The input layer receives the incoming $x_{i}=(x_{1},x_{2},\ldots ,x_{n})^{T}$ patterns. The hidden layer calculates the Mahalanobis distance between the input patterns and all the hyperellipsoids placed by the dendrites with Eq. 1. Lastly, the output layer assigns patterns to their nearest dendrites which are related to their corresponding $C^{1},C^{2},\ldots ,C^{j}$ classes with Eq. 2:

$$\begin{aligned} \tau {}_{K}=[x_{i}-\mu _{k}]^{T}\sum {}_{k}^{-1}[x_{i}-\mu _{k}], \end{aligned}$$

(1)

$$\begin{aligned} y_{i}=argming(\tau {}_{K}), \end{aligned}$$

(2)

where $x_{i}$ is a n dimensional vector, $\tau {}_{K}$ is a vector with k Mahalanobis distances and $y_{i}$ is the output vector of each $x_{i}$ pattern. $\sum {}_{k}^{-1}$ is the covariance matrix and $\mu _{k}$ is the centroid vector both related with the k hyperellipsoids.

Once we experimentally observed that DEN has a good performance with small dimensional datasets but not with high dimensional datasets, we occurred to the task of setting the hyperellipsoids by using SGD as an optimization method [4].

To do this, we first implemented the hidden layer (Eq. 1) in a Keras custom layer [5]. Keras computes gradients by using automatic differentiation which automatically calculate the function derivatives of a computer program [17]. And then, we removed the output layer (Eq. 2) which could be replaceable by one or more neurons.

5 Classifiers and Results

This section shows the general details of the used classifiers for the MI classification task and the results achieved by them.

The first technique was a Support Vector Machine (SVM) [18] which is one of the most widespread methods for Brain-Computer Interface (BCI) based on EEG, as previously mentioned. It is formed by two layers: depending on the kernel, the first layer is employed for feature extraction and the second layer creates a hyperplane to separate patterns into two different classes. The goal of the second layer is to create a hyperplane with optimal margins among the support vectors.

In the experiment, we implemented the SVM with a Radial Basis Function (RBF) kernel; so commonly utilized on BCI based on EEG [19, 20]. And we selected the $\gamma $ gamma and the C compensation factor by doing a grid sweep in order to choose the best parameters.

The second classifier was a Multilayer Perceptron (MLP) [21]. This was implemented with two hidden layers, each layer with 100 ReLU neurons and an output layer with a sigmoid neuron ($\sigma $). To decrease the overfitting problem, we applied dropout with a rate of 0.2 between each layer.

The last classifiers were DEN and DEN trained by SGD (DEN_SGD). DEN_SGD architecture was composed by three hyperellipsoids with sigmoid activation functions in the input layer and a sigmoid neuron in the output layer, Fig. 3.

Table 1 presents the accuracy achieved by the four classifiers. It can be appreciated that the SVM always obtained a 100% of accuracy in the training stage. However, it computed a 65.81% in testing. As can be seen, the SVM has a high overfitting problem.

Table 1. Experimental results acquired by the SVM, MLP, DEN and SGD_DEN classifiers using our EEG dataset.

Full size table

DEN achieved the lowest accuracy in training and in testing, 80.82% and 62.77%, respectively.

The best classifiers for this task were the MLP and the SGD_DEN. The MLP acquired an accuracy of 72.38% in testing and the SGD_DEN slightly acquired an improvement with 76.02%. Both presented the overfitting problem, but it was less with SGD_DEN.

Finally, as a comparison of the proposed method with the other classifiers in statistical terms, it was performed a paired t-test with a significance level of $\alpha =0.05$. Table 2 gives the p-values acquired in the test. The comparisons between SGD_DEN and SVM, MLP and DEN achieved a less value than $\alpha $, which indicates that for this dataset SGD_DEN has a significantly better performance.

Table 2. $P-values$ of a paired t-test with $\alpha =0.05$.

Full size table

6 Conclusions

In this research, we have implemented SGD to train a DEN and acquired an EEG dataset from eight healthy participants to test the performance of the proposed training algorithm. The besought model achieved an enhancement of 13.25% over the DEN training algorithm and an improvement of 3.64% and 10.21% compared with the MLP and the SVM, respectively. We invite the reader to regard an improvement obtained with a shallow architecture which can be easily implemented in embedded electronic devices. Future work will be the evaluation of DEN_SGD using standard datasets and the implementation of this network to control external electronic devices.

References

Leshno, M., Lin, V.Y., Pinkus, A., Schocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 6(6), 861–867 (1993)
Article Google Scholar
Arce, F., Zamora, E., Sossa, H.: Dendrite ellipsoidal neuron. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 795–802, May 2017
Google Scholar
Arce, F., Zamora, E., Fócil-Arias, C., Sossa, H.: Dendrite ellipsoidal neurons based on k-means optimization. Evolving Syst. 1–16 (2018). https://app.dimensions.aion2019/02/18
Montavon, G., Müller, K.-R.: Big learning and deep neural networks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 419–420. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_24
Chapter Google Scholar
Chollet, F., et al.: Keras (2015). https://keras.io
Ritter, G.X., Li, D., Wilson, J.N.: Image algebra and its relationship to neural networks (1989)
Google Scholar
Davidson, J.L., Ritter, G.X.: Theory of morphological neural networks (1990)
Google Scholar
Davidson, J.L., Sun, K.: Template learning in morphological neural nets (1991)
Google Scholar
Davidson, J.L., Hummer, F.: Morphology neural networks: an introduction with applications. Circ. Syst. Sig. Process. 12(2), 177–210 (1993)
Article MathSciNet Google Scholar
Segev, I.: The Handbook of Brain Theory and Neural Networks, pp. 282–289. MIT Press, Cambridge (1998)
Google Scholar
Ritter, G.X., Iancu, L., Urcid, G.: Morphological perceptrons with dendritic structure. In: The 12th IEEE International Conference on Fuzzy Systems, FUZZ 2003, vol. 2., pp. 1296–1301, May 2003
Google Scholar
Mahalanobis, P.C.: On the generalised distance in statistics. Proc. Natl. Inst. Sci. India 2(1), 49–55 (1936)
MathSciNet MATH Google Scholar
Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia (2007)
Google Scholar
Sung, K.-K., Poggio, T.: Learning human face detection in cluttered scenes. In: Hlaváč, V., Šára, R. (eds.) CAIP 1995. LNCS, vol. 970, pp. 432–439. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60268-2_326
Chapter Google Scholar
Sung, K.K., Poggio, T.: Example-based learning for view-based human face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 39–51 (1998)
Article Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)
MATH Google Scholar
Rall, L.B.: Automatic computation of gradients, Jacobians, Hessians, and applications to optimization. In: Rall, L.B. (ed.) Automatic Differentiation: Techniques and Applications. LNCS, vol. 120, pp. 91–111. Springer, Heidelberg (1981). https://doi.org/10.1007/3-540-10861-0_6
Chapter MATH Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Lotte, F., Congedo, M., Lécuyer, A., Fabrice, L., Arnaldi, B.: A review of classification algorithms for EEG-based brain-computer interfaces. J. Neural Eng. 4, R1 (2007)
Google Scholar
Ali, A.B.M.S., Abraham, A.: An empirical comparison of kernel selection for support vector machines. In: Soft Computing Systems: Design, Management and Applications, pp. 321–330 (2002)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, pp. 318–362. MIT Press, Cambridge (1986)
Google Scholar

Download references

Acknowledgments

H. Sossa and E. Zamora would like to acknowledge the support provided by CIC-IPN and M. Antelis to Tecnológico de Monterrey, in carrying out this research. This work was economically supported by SIP-IPN (grant numbers 20180180, 20180730, 20190007 and 20190166) and CONACYT grant numbers 65 (Frontiers of Science), 268958 and PN2015-873. F. Arce and O. Mendoza-Montoya acknowledge CONACYT for the scholarship granted towards pursuing their PhD and post-PhD studies, respectively.

Author information

Authors and Affiliations

Instituto Politécnico Nacional - CIC, Av. Juan de Dios Batiz S/N, Gustavo A. Madero, 07738, Mexico City, Mexico
Fernando Arce, Erik Zamora & Humberto Sossa
Tecnológico de Monterrey Campus Guadalajara, Av. Gral Ramón Corona 2514, 45201, Zapopan, Jalisco, Mexico
Omar Mendoza-Montoya, Javier M. Antelis, Humberto Sossa, Luis G. Hernández & Luis Eduardo Falcón
Division of Medical Engineering Research, Instituto Nacional de Rehabilitación, 14389, Mexico City, Mexico
Jessica Cantillo-Negrete & Ruben I. Carino-Escobar

Authors

Fernando Arce
View author publications
You can also search for this author in PubMed Google Scholar
Omar Mendoza-Montoya
View author publications
You can also search for this author in PubMed Google Scholar
Erik Zamora
View author publications
You can also search for this author in PubMed Google Scholar
Javier M. Antelis
View author publications
You can also search for this author in PubMed Google Scholar
Humberto Sossa
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Cantillo-Negrete
View author publications
You can also search for this author in PubMed Google Scholar
Ruben I. Carino-Escobar
View author publications
You can also search for this author in PubMed Google Scholar
Luis G. Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Luis Eduardo Falcón
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fernando Arce .

Editor information

Editors and Affiliations

National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico
Jesús Ariel Carrasco-Ochoa
National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico
José Francisco Martínez-Trinidad
Autonomous University of Puebla , Puebla, Mexico
José Arturo Olvera-López
National Polytechnic Institute of Mexico , Querétaro, Mexico
Joaquín Salas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arce, F. et al. (2019). Dendrite Ellipsoidal Neuron Trained by Stochastic Gradient Descent for Motor Imagery Classification. In: Carrasco-Ochoa, J., Martínez-Trinidad, J., Olvera-López, J., Salas, J. (eds) Pattern Recognition. MCPR 2019. Lecture Notes in Computer Science(), vol 11524. Springer, Cham. https://doi.org/10.1007/978-3-030-21077-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-21077-9_8
Published: 18 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21076-2
Online ISBN: 978-3-030-21077-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)