1 Introduction

The main objective about pattern classification is to stablish a mathematical function that associate input patterns \(x_{i}=(x_{1},x_{2},\ldots ,x_{n})^{T}\) to their corresponding classes \(C^{1},C^{2},\ldots ,C^{j}\). This assignation must be as strong as possible to reduce potential variations in incoming data and must be capable to find elemental relationships between patterns.

Since 60 years ago, many types of neural networks have been presented for solving classification tasks and the most common approach is a group of classical perceptrons which create hyperplanes to divide and associate data by using synaptic weights, biases and activation functions.

In a common neural network, each neuron can divide the input search space into two parts. Thereby, appending more neurons in a single layer, the network has the capacity to learn any complex function [1].

Other type of neural networks less known are Dendrite Morphological Neural Networks (DMNN) which separate data employing hyperboxes. These neurons group patterns using minimum or maximum operators to generate the piecewise boundaries for classification tasks. DMNN have the advantage of being easily implemented in logic devices.

This research proposes an improvement to a specific type of morphological neural networks called Dendrite Ellipsoidal Neuron (DEN) trained with k-means++ algorithm [2, 3]. DEN has shown good performance in low dimensional datasets, requiring few training parameters and it is easy to implement in logic devices. Although DEN has shown to be efficient, it also has poor performance with high dimensional datasets. All these DEN advantages motivated us to explore new ideas in order to improve DEN accuracy in high dimensional datasets.

In this paper, we trained a DEN using Stochastic Gradient Descent (SGD) [4] implemented as a neural network layer of the Keras library [5] in Python. Furthermore, in order to test the proposed training algorithm in a high dimensional dataset, Electroencephalography (EEG) data were adquired from eight able-bodied subjects for classifying Motor Imagery (MI) of the hands into binary classes (Left vs Right). Contributions of this research are:

  • This is the first time that a DEN is trained by SGD.

  • Through a series of experiments, we show that the new training algorithm outperforms the actual DEN accuracy for our dataset, and the accuracy achieved by some of the most common classifiers for MI.

The rest of the paper is structured as follows: Sect. 2 provides a chronological list of publications related to previous literature with a comprehensive explanation. Section 3 describes the methods and materials used to obtain and characterize the EEG signals. Section 4 shows the DEN and the proposal architecture. Section 5 describes the general details of the used classifiers and the experimental results. In Sect. 6, we give our conclusions and future work.

2 Related Works

Morphological Neural Networks MNN were originally besought by Ritter and Davidson as a combination between neural networks and image algebra [6,7,8,9]. After this, Arbib published a book where it was taken into account that biological neurons process the information not only in the neuron cells but also in the dendrites [10]. Other related works resumed this research [11]. And several approaches have been proposed based on heuristics to manipulate hyperboxes not taking into account the dendrite fitness function.

All these techniques create hyperboxes to divide the input space into rectangular segments. In addition, in 2017 we presented DEN which changes the operations performed for MNN dendrites by using the Mahalanobis distance [12]. The main advantage of the ellipsoidal model is that it creates smoother decision boundaries and not rectangular regions.

Some other similar approaches to the k-means++ [13] clustering algorithm and the Mahalanobis distance [12] are elliptical k-means clustering algorithms, Gaussian Mixture Models (GMM), and classifiers based on the Mahalanobis distance.

Authors in [14, 15] employed elliptical k-means clustering algorithm to discriminate between human and nonhuman faces. For this, they altered k-means by modifying the normalized Mahalanobis distance to achieve six face pattern clusters.

GMM is a probabilistic technique focused to approximate almost any continuous density by using an enough number of Gaussian [16].

3 Methods and Materials

This section describes the experiments carried out to obtain EEG signals from subjects whom performed MI of both hands and the preprocessing and feature extraction procedure.

3.1 Experiment Setup

For this study, eight healthy people (three males and five females) aged 25 to 30 participated in an experiment designed to obtain EEG recordings for two mental tasks:

  1. 1.

    Imagined movements of the left hand and,

  2. 2.

    Imagined movements of the right hand.

These experimental conditions consisted of flexion and extension of the fingers of the right and left hand mentally without performing the actual movements. A graphical user interface developed by our team provided the instructions of the experiment and indicated when the subject had to carry out the mental imaginations.

The experiment was divided into 16 blocks of a duration of 28 s each. A block started with a fixation cross shown on the screen for 5 s, followed by a visual cue of the action to be performed by the participant (3 s). White arrows and a sphere that moved from the center of the screen to the left or right side of the monitor represented the different types of tasks, Fig. 1 (Right). Then, the participant had to execute for 15 s the imagined movement specified by the interface. Finally, the word “Rest” was shown on the screen for 5 s, indicating that the subject could relax or move freely until the beginning of the next block. The software selected the task of each block randomly. Also, both conditions were balanced, i.e., the subject performed eight times the “left” task and eight times the “right” task. In total, an experiment lasted around seven and a half minutes. Fig. 1 (Left) illustrates the different stages of this paradigm.

Fig. 1.
figure 1

Left: The experiment starts with Fixation Cross, which indicates the beginning of the experiment. After this, Next Task just indicates the action that will be performed. Mental Task shows the mental task needed to be carried out. And Rest Time indicates relaxing time. Right: Visual cues. \(+\) Fixation Cross. \(\leftarrow \) Imagined movements of the left upper limb. \(\rightarrow \) Imagined movements of the right upper limb. Reset.

During the experiment execution, a g.USBamp amplifier recorded EEG signals from 12 active electrodes at a sampling rate of 256 Hz (g.tec medical engineering GmbH, Austria). Data were band-pass filtered from 0.1 to 100 Hz, and a built-in notch filter removed the power supply noise. According to the international 10/20 system, the electrode positions used in this experiment were FC3, FCz, FC4, C3, Cz, C4, CP3, CPz, CP4, P3, Pz, and P4. This arrangement was selected to cover scalp locations that are close to the motor cortex. Additionally, the ground electrode was located at AFz, and the reference electrode was placed over the right earlobe.

3.2 Preprocessing and Feature Extraction

The Common Spatial-Pattern (CSP) algorithm was used to characterize the brain activity of both experimental conditions. This algorithm finds linear combinations of the original EEG signals (or band-limited components of the EEG) so that the variances of the new signals of one condition are maximized, whereas the variances of the signals of the other condition are minimized. In this way, if the log-variances of the signals in the projected space are used as features, the separability between conditions is optimal. In this study, the CSP algorithm was applied over band-limited components extracted by a filter bank. This strategy is commonly known as Filter Bank Common Spatial-Pattern (FBCSP).

In the preprocessing stage, a filter bank of gaussian bandpass filters with a bandwidth of 4 Hz extracted 22 components from the EEG signals (4, 5, 6, ..., 25 Hz). Then, the data was separated into epochs or trials of 1 second of time samples. Trials contaminated by visual or muscular artifacts were identified and rejected from this study. Finally, the CSP algorithm was used to compute a new set of signals for each frequency component to increase the separability between conditions. For each band, the three best spatial filters that maximize the variances of the “Left” conditions were calculated. Likewise, the three best spatial filters that maximize the variances of the “Right” conditions were also computed, associated to a class label y {Left, Right}. The features used in the classification stage were the log-variances of these time-series. In total, each trial consisted of 132 new projected signals, \(x\in \mathfrak {R}^{132\times 256}\).

4 Den Architecture

DEN has the same structure as any other neural network architecture: an input, a hidden and an output layer, Fig. 2.

The input layer receives the incoming \(x_{i}=(x_{1},x_{2},\ldots ,x_{n})^{T}\) patterns. The hidden layer calculates the Mahalanobis distance between the input patterns and all the hyperellipsoids placed by the dendrites with Eq. 1. Lastly, the output layer assigns patterns to their nearest dendrites which are related to their corresponding \(C^{1},C^{2},\ldots ,C^{j}\) classes with Eq. 2:

$$\begin{aligned} \tau {}_{K}=[x_{i}-\mu _{k}]^{T}\sum {}_{k}^{-1}[x_{i}-\mu _{k}], \end{aligned}$$
(1)
$$\begin{aligned} y_{i}=argming(\tau {}_{K}), \end{aligned}$$
(2)

where \(x_{i}\) is a n dimensional vector, \(\tau {}_{K}\) is a vector with k Mahalanobis distances and \(y_{i}\) is the output vector of each \(x_{i}\) pattern. \(\sum {}_{k}^{-1}\) is the covariance matrix and \(\mu _{k}\) is the centroid vector both related with the k hyperellipsoids.

Fig. 2.
figure 2

DEN architecture with an input, a hidden and an output layer.

Once we experimentally observed that DEN has a good performance with small dimensional datasets but not with high dimensional datasets, we occurred to the task of setting the hyperellipsoids by using SGD as an optimization method [4].

To do this, we first implemented the hidden layer (Eq. 1) in a Keras custom layer [5]. Keras computes gradients by using automatic differentiation which automatically calculate the function derivatives of a computer program [17]. And then, we removed the output layer (Eq. 2) which could be replaceable by one or more neurons.

5 Classifiers and Results

This section shows the general details of the used classifiers for the MI classification task and the results achieved by them.

The first technique was a Support Vector Machine (SVM) [18] which is one of the most widespread methods for Brain-Computer Interface (BCI) based on EEG, as previously mentioned. It is formed by two layers: depending on the kernel, the first layer is employed for feature extraction and the second layer creates a hyperplane to separate patterns into two different classes. The goal of the second layer is to create a hyperplane with optimal margins among the support vectors.

In the experiment, we implemented the SVM with a Radial Basis Function (RBF) kernel; so commonly utilized on BCI based on EEG [19, 20]. And we selected the \(\gamma \) gamma and the C compensation factor by doing a grid sweep in order to choose the best parameters.

The second classifier was a Multilayer Perceptron (MLP) [21]. This was implemented with two hidden layers, each layer with 100 ReLU neurons and an output layer with a sigmoid neuron (\(\sigma \)). To decrease the overfitting problem, we applied dropout with a rate of 0.2 between each layer.

The last classifiers were DEN and DEN trained by SGD (DEN_SGD). DEN_SGD architecture was composed by three hyperellipsoids with sigmoid activation functions in the input layer and a sigmoid neuron in the output layer, Fig. 3.

Fig. 3.
figure 3

DEN_SGD architecture for EEG classification task.

Table 1 presents the accuracy achieved by the four classifiers. It can be appreciated that the SVM always obtained a 100% of accuracy in the training stage. However, it computed a 65.81% in testing. As can be seen, the SVM has a high overfitting problem.

Table 1. Experimental results acquired by the SVM, MLP, DEN and SGD_DEN classifiers using our EEG dataset.

DEN achieved the lowest accuracy in training and in testing, 80.82% and 62.77%, respectively.

The best classifiers for this task were the MLP and the SGD_DEN. The MLP acquired an accuracy of 72.38% in testing and the SGD_DEN slightly acquired an improvement with 76.02%. Both presented the overfitting problem, but it was less with SGD_DEN.

Finally, as a comparison of the proposed method with the other classifiers in statistical terms, it was performed a paired t-test with a significance level of \(\alpha =0.05\). Table 2 gives the p-values acquired in the test. The comparisons between SGD_DEN and SVM, MLP and DEN achieved a less value than \(\alpha \), which indicates that for this dataset SGD_DEN has a significantly better performance.

Table 2. \(P-values\) of a paired t-test with \(\alpha =0.05\).

6 Conclusions

In this research, we have implemented SGD to train a DEN and acquired an EEG dataset from eight healthy participants to test the performance of the proposed training algorithm. The besought model achieved an enhancement of 13.25% over the DEN training algorithm and an improvement of 3.64% and 10.21% compared with the MLP and the SVM, respectively. We invite the reader to regard an improvement obtained with a shallow architecture which can be easily implemented in embedded electronic devices. Future work will be the evaluation of DEN_SGD using standard datasets and the implementation of this network to control external electronic devices.