A Brute-Force CNN Model Selection for Accurate Classification of Sensorimotor Rhythms in BCIs

The ultimate goal of Brain-Computer Interface (BCI) research is to enable individuals to interact with their environment by translating their mental imagery. In this regard, a salient issue is the identification of brain activity patterns that can be used to classify intention. Using Electroencephalographic (EEG) signals as archetypical, this classification problem generally possesses two stages: (i) extracting features from collected EEG waveforms; and (ii) constructing a classifier using extracted features. With the advent of deep learning, however, the former stage is generally absorbed into the latter. Nevertheless, the burden has now shifted from trying a number of feature extraction methods to tuning a large number of hyperparameters and architectures. Among existing deep learning architectures used in BCI, Convolutional Neural Networks (CNN) have become an attractive choice. Most of the existing studies that use these networks are based on well-known architectures such as AlexNet or ResNet, use the domain knowledge to construct the final architecture or have an unclear strategy deployed for model selection. This raises the question as to whether constructing accurate CNN-based classifiers is possible using a principled model selection, with the most straightforward one being the brute-force search or, alternatively, experience and developing high intuition regarding hyperparameters combined with an ad hoc approach is the most prudent way to go about designing them. To this end, in this paper, we first define a space of hyperparameters restricted by our computing power. Then we show that an exhaustive search within this limited space of CNN hyperparameters leads to accurate classification of sensorimotor rhythms that arise during motor imagery tasks.


I. INTRODUCTION
Brain-Computer Interface (BCI) research aims to provide alternative channels for communication and control without involving any peripheral nervous system [1]- [3]. This technology is important for people who are affected by motor disabilities such as stroke, paraplegia, and amyotrophic lateral sclerosis [4], [5]. It has already been employed to control external devices such as robotic prostheses/orthoses [6], and exoskeletons for stroke rehabilitation studies [7], [8].
The design of a high-performance BCI system is still an open research problem in the field and requires accurate The associate editor coordinating the review of this manuscript and approving it for publication was Larbi Boubchir . decoding of the EEG signals generated by neuro-electrical activities in the brain [9]- [11]. EEG-based BCI systems are classified as exogenous and endogenous, where the former requires an external stimulus to excite specific responses in the brain [12], [13]. Depending on the type of stimulation, the exogenous BCIs use steady-state visual evoked potentials (SSVEPs), or event-related potentials (ERPs), brain signals elicited in response to cognitive or sensory events. The advantages of exogenous BCIs are related to the high information transmission rate with little user training requirements [13]. However, an exogenous BCI system constraints the user to focus on the visual stimuli and, as a result, its usefulness may be limited, especially for severely motor-impaired people [14]. This paper focuses on the endogenous category of BCI systems, which utilize sensory-motor rhythms (SMRs) for control of external devices, independent of any stimuli. The SMRs represent the modulations of oscillatory activity in EEG induced by motor imagery of limb movement as input features [15], [16].
From a machine learning perspective, there is already a variety of techniques for feature extraction, feature selection, and classification for SMR-based BCI systems that have been developed over the last decade [11], [17]. However, recent BCI studies exhibit a trend of moving towards deep learning methods as they have already shown the state-of-the-art performance in learning from raw EEG data, mitigating the need for laborious preprocessing or extracting handcrafted features [18].
Convolutional neural networks (CNNs) have emerged as powerful deep learning architectures for decoding complex event-related desynchronization and synchronization (ERD/ERS) patterns elicited in SMRs in various BCI applications [18], [19]. Several studies have already demonstrated the efficiency of CNNs in terms of training time (as compared to some of the popular recurrent neural networks) and effectiveness to capture latent features from raw EEG data [20]. For instance, Schirrmeister et al. [21] evaluated several design choices to develop CNN-based architecture for interpreting the imagined and executed tasks using raw EEG data. Uktveris and Jusas [22] tested 11 different CNN architectures and selected the optimal CNN parameters for four-class motor imagery classification. Sakhavi et al. [23] investigated three different convolution scenarios: convolution across time (Channel-wise CNN), convolution across channels (Channel-mixing CNN), and convolution across both time and channels with a two-dimensional kernel , and demonstrated the effectiveness of CNNs in SMR classification. Dai et al. [24] proposed a hybridscale CNN architecture that incorporates a data augmentation technique and achieved accurate classification in two different motor imagery datasets. In addition to purely CNNbased architectures, hybrid models incorporating CNN and other deep structures have been used in the context of SMR-based BCIs. For instance, studies in [25]- [27] investigated hybrid CNN and stacked autoencoders (AE) architectures for EEG classification. Likewise, constructing a hybrid CNN and gated recurrent units [28], and CNN with the long-term, short-term memory networks [29] have been studied.
The studies mentioned above exhibit the state-of-the-art performance of deep learning models in learning from raw data, alleviating the need for manual EEG feature engineering. Nevertheless, some studies still explore various EEG data transformation techniques that could enhance deep learning models. For instance, Chaudhary et al. [30] applied continuous wavelet transforms (CWT) and shorttime-Fourier-transform to EEG data, used the outcome of these transformations as inputs to an AlexNet-based CNN classifier and showed that CWT features yield better accuracy [31]. A similar study [32] applied blind source separation and CWT to preprocess EEG as input to a CNN classifier [33] and studied effects of adjustments in the convolutional stride and max-pooling size to improve classification accuracy. Further research attempted to identify the best feature extraction, and deep learning-based classification pipeline for the SMR based BCIs [11], [18], [34]. However, a recent study by Lawhern et al. proposed a CNN-based architecture called EEGNet, which incorporates a feature extractor and a classifier in a CNN model [35]. Mainly, the EEGNet was inspired by the Filter Bank Common Spatial Patterns (FBCSP) -a celebrated spatial feature extraction method [36], [37]. A comparative analysis between EEGNet and other approaches demonstrated that with a much lower number of parameters, EEGNet could perform as good as or even better than several models in terms of generalization error across various BCI paradigms [35]. Other studies inspired by the FBCSP/CSP methods are [23], [38].
In this study, we are concerned with standard convolutional neural networks. Table 1 summarizes the studies above from the perspective of the hyperparameter selection strategy. In this regard, we distinguish structural hyperparameters (number of layers, filters, kernel size, and the choice of activation functions) from algorithmic hyperparameters (batch size, dropout rate, and signal segmentation and the choice of optimizer and its learning rate).
As seen in Table 1, most of the studies that use the neural networks are based on well-known architectures such as AlexNet or ResNet, use the domain knowledge to construct the final architecture or have an unclear strategy deployed for model selection. This state of affairs can be attributed to a relatively large number and wide range of hyperparameters involved, which together render a principled model selection difficult. This raises the question of whether a principled model selection, such as the brute-force search within a limited space of hyperparameters, could lead to architectures that can accurately classify motor imagery tasks based on collected EEG data. To examine this question, in this investigation, we conduct the first large-scale analysis in which we use EEG waveforms obtained from a number of subjects across different datasets to determine a single architecture via a brute-force model selection. We then examine the classification accuracy of the selected model on a set of subjects from an independent dataset that was unseen during the architecture selection phase.
This work proceeds as follows. Section II-A provides the description of (i) publicly available datasets used in our study; (ii) the preprocessing steps applied to our datasets. Section III presents the CNN architectures that we use as part of our brute-force model selection procedure. In Section IV, we discuss the CNN training and model selection procedures, and present the results of applying constructed models in Section V to independent test data. The key findings and observations are summarized in Section VI and, Section VII provides the brief conclusion of the work.

A. DATASET DESCRIPTION
In training and choosing the structure of our constructed CNNs, we have used publicly available EEG data from different sources corresponding to motor-imagery tasks recorded from a number of subjects. In this study, we focus on the decoding of the limb imagery data corresponding to lefthand and right-hand imagery and select data corresponding to those imagery tasks accordingly. Further, we segmented the continuous EEG into a left-hand and right-hand imagination trials of 4−seconds length after the mental imagery onset in all datasets. The detailed description of each dataset is provided below.

1) Weibo2014
This dataset consists of EEG data recorded from ten healthy right-handed subjects (seven females and three males, 23-25 years old). The 64-channel data were acquired using a Neuroscan SynAmps2 amplifier at a sampling rate of 1000 Hz. The data were further band-pass filtered to the range of 0.5-50 Hz and down-sampled to 200 Hz for the subsequent analysis. The electrodes were placed according to the international 10-20 system referenced to the nose and grounded prefrontal lobe. The original study investigated the differences in EEG patterns within simple limb imagery and compound limb motor imagery. A single trial lasted for eight seconds. After two seconds, visual cue in the form of a red circle was shown to participants for a second; subsequently, a cue indicating left-hand or right-hand imagery was shown during which the participants performed kinesthetic motor imagery for about four seconds. The entire experiment lasted for nine sessions, wherein each session, 60 trials of motor imagery data were collected for each task (for more details, see [42]).

2) PHYSIONET
Physionet Motor/Mental Imagery database has been used. This dataset corresponds to a large-scale motor-imagery EEG data acquired from 109 participants while performing different motor imagery tasks. The EEG data were sampled at 160 Hz from 64-channels using the BCI2000 system according to the international 10-10 electrode placement system. Here, we chose the trials that correspond to right-hand and left-hand motor imagery tasks. Each recorded trial lasted for 4 seconds, which was then followed by 4 seconds rest. The total number of trials acquired for right-hand and lefthand motor imagery tasks was equal to 46 trials for each subject. The dataset from this experiment is publicly available at [43]-readers are referred to [44], [45] for more details.

3) BCI COMPETITION VI -DATASET 2a (BCI-DataSet2A)
This dataset was acquired from nine subjects using a cuebased BCI paradigm. Participants performed four different types of imagery tasks that included imagination of left-hand, right-hand, both feet, and tongue movement. The data were obtained from two different sessions wherein each session, 288 trials were collected for a given imagery task. A session started with a fixation cross on a black screen, and after two seconds, an additional visual cue arrow pointing to the left, right, up, and down was shown for 1.25 seconds. The participants were instructed to perform motor imagery tasks until the fixation cross disappeared on the screen after six seconds. EEG data were recorded monopolar using 22 Ag/AgCl electrodes according to the 10-20 electrode placement system. The reference electrode was set to the left mastoid while the ground was set to the right mastoid. The data were sampled with 250 Hz and bandpass-filtered between 0.5 Hz and 100 Hz (for more details, see [46]).

4) BCI COMPETITION VI -DATASET 2b (BCI-DataSet2B)
This dataset consists of EEG data recorded from nine healthy, right-handed participants. EEG data were sampled with 250 Hz using three channels (C3, Cz, and C4) and bandpassfiltered between 0.5 Hz and 100 Hz. Each subject participated in five data acquisition sessions, where the first two sessions were conducted without feedback, and the last three sessions with feedback. The electrode Fz was used as EEG ground. Participants performed two-class motor imagery of the lefthand and right-hand movements using a cue-based paradigm. The total number of data samples recorded from each session for each motor imagery task was equal to 120 trials. Each trial in a session started with a fixation cross and an additional short acoustic warning tone. A few seconds later, a visual cue arrow was shown for 1.25 seconds, after which the participants were instructed to perform motor imagery tasks over four seconds. There was a short break (1.5 seconds) between each consecutive trial. The session with feedback used a smiley on a screen at the beginning of each trial (t = 0). The visual cue was shown between t = 3 to t = 7.5 seconds, and subjects performed the motor imagery tasks (for more details, see [47].

B. EEG PRE-PROCESSING
The deep learning research has shown enhanced performance in learning from raw EEG data, mitigating the need for preprocessing or handcrafted features [19], [21], [34]. Here, we apply a minimal preprocessing to all four datasets described in the previous section, and refer readers to [21], [35] for more details regarding the preprocessing steps and usefulness of deep learning models employed. In this regard, the EEG waveforms were high-pass filtered above 4 Hz using a fourth-order Butterworth IIR filter. The high-pass filter with 4 Hz cut-off frequency was used to suppress electro-oculographic artifacts that arose due to eye movement dominant between 0.1 and 4 Hz band in EEG. Other than that, and as it was suggested by [21], we did not apply low-pass filtering to leave the raw EEG data intact.
Further, the continuous EEG was segmented into a lefthand and right-hand imagination trials with a four-second length following the motor imagery onset. Subsequently, EEG data trials were artifact corrected by applying a statistical threshold to exclude: (i) bad EEG trials correlated with egregious movement noise; and (ii) channels that are noisy because of possible poor connection to the scalp of a participant. Bad trials were identified by calculating the mean absolute value per trial and eliminating trials with values higher than three standard deviations over the mean trial.
All the methods described here have been implemented in the MNE Python environment [48].
The minimal preprocessing steps adopted allows the constructed CNNs to learn discriminative features automatically from a nearly set of raw EEG data without the imposition of prior knowledge through excessive preprocessing.

III. CLASSIFICATION RULES
The key question in this study is whether using standard convolutional neural networks within a systematic model selection would possibly lead to a comparable classification accuracy for motor imagery tasks as the current stateof-the-art deep learning architectures, which are partially inspired by domain knowledge. Therefore, we first briefly describe the main building block of standard CNNs, which is the convolutional operation. We then briefly describe the EEGNet architecture [35], which serves as the benchmark for comparison as it has shown the state-of-the-art performance in various EEG classification tasks. Hereafter, we refer to any specific CNN that is constructed as part of the systematic model selection process as ConvNet.

A. CONVOLUTIONAL LAYER
As the name suggests, a convolutional layer convolves an input tensor by another tensor parameters tuned by the learning algorithm (kernel). Let X denote the 3-D input tensor with elements X i,j,k where i, j, and k denote the channel (also known as depth), row, and column, respectively. Let us assume 1 ≤ i ≤ c X (input channels), 1 ≤ j ≤ h X (input height), and 1 ≤ k ≤ w X (input width). For our data, the first convolutional layer has c X = 1 (similar to a grayscale image), h X = (number of EEG channels), w X = 80 Hz × (4000) ms = 320.
The 4-D kernel tensor is denoted by K with elements K l,i,j,k for 1 ≤ l ≤ c Y (output channels), 1 ≤ i ≤ c X , 1 ≤ j ≤ h K (kernel height), and 1 ≤ j ≤ w K (kernel width) where h K < h X and w K < w X . Therefore, K contains c Y kernels (filters) of size c X [same as input] × h K [smaller than input] × w K [smaller than input]. Convolving K across X (with no padding/subsampling/bias terms) is obtained by computing the 3-D output tensor (feature map) where the summation is over all valid indices. When there are multiple convolutional layers in one network, the output feature map of one layer serves as the input to the next layer. Assuming one bias term for each kernel, in training a CNN c Y (c X h K w K + 1) parameters are learned for each layer using the back-propagation algorithm [49].

B. EEGNet: A BENCHMARK FOR COMPARISON
As a benchmark for comparing the performance of constructed ConvNets presented in the next section, VOLUME 8, 2020  we consider a well-known deep learning architecture known as EEGNet [35]. The EEGNet architecture is composed of three convolutional layers with the following characteristics: the first convolutional layer uses a temporal convolution to learn the parameters of frequency filters. The second convolutional layer uses a depthwise convolution designed to learn frequency-specific spatial filters. The third convolutional layer uses a depth-wise convolution, which is used to learn a temporal summary for each feature map individually, followed by a pointwise convolution, which is used to learn an optimal combination of feature maps. The details of the network architecture can be found in Table 2 where the convolutional operations, batch normalization, the exponential linear unit activation function, the average pooling operation, and the fully connected linear layer are identified by Conv2d, BatchNorm2d, ELU, AvgPool2d, and linear, respectively. We re-implemented EEGNet models in PyTorch using the source code-shared by the authors in [35].

IV. TRAINING AND MODEL SELECTION
Due to the physical and mental burden on human subjects in motor imagery experiments, it is usually challenging to collect a large sample for each subject. Generally, a limited number of EEG training observations can be acquired in one experimental session, which itself usually lasts an hour. Such an experiment should be repeated over weeks and months to collect a large sample, which is generally required to train deep neural networks. Table 3 shows the total number of observations (i.e., signal segments of 4000 ms; as described in Sections II-A and II-B) that were used in training and validation sets (used for model selection described next in this section) across both right-hand and left-hand motor imagery classes for all datasets used in our study (see Section II-A for description of these datasets). For each dataset, we pooled the observations collected for all subjects within the dataset, 80% of this pooled sample was randomly set aside for training, and the rest was left for model selection. To increase the sample size, one may segment the collected waveforms to a series of shorter time intervals [21], [50]. In this regard, we generated multiple training observations by subsampling the original complete trial with a fixed time window of length t = 1500 ms with a 50% overlap along the time axis. Subsequently, we trained ConvNets on multiple time-window segments obtained from the completed trials.
The following assumptions were made to define the possible search space of hyperparameters: (i) the maximum number of examined epochs for all models was set to 150 (no early stopping); (ii) the mini-batch gradient descent (batch size of 64) with Adam optimizer with a learning rate of 0.001 and a decay of 0.0001 was examined [51]; (iii) the loss function was cross-entropy [52]; (iv) the maximum number of convolutional layers that were examined was 6 with a single fully connected layer; (v) a dropout layer with the retention probability of p = 0.5 was used prior to the fully connected layer [53]; (vi) assuming h K,j and w K,j denote the height and the width of the kernel used in layer j, respectively, where j = 1, . . . , L with L being the total number of layers used in the architecture (2 ≤ L ≤ 6), in training ConvNets, we used a common rectangular shape kernel for all layers such that h K,j = h K = 3, ∀j, and w K,j = w K = 8 * τ , ∀j where τ ∈ [1, 3, 5] was introduced to define convolutional filters that cover different temporal features of EEG 1 ; and (vii) assuming c Y,j denotes the number of output channels in layer j, we considered both an increasing and decreasing pattern for c Y,j with c Y,j = 2 2+j and c Y,j = 2 L+3−j .
The above set of assumptions sets the cardinality of hyperparameter space to 4500 = 150 (epoch) × 10 (patterns of c Y,j ) × 3 (possible values of w K )-the latter two hyperparameters define 30 possible ConvNet architectures. In addition, all ConvNets are based on a stack of standard convolutional layers; that is to say, each layer includes a convolutional operation (Section III-A), batch normalization, the rectified linear unit activation function, and the max-pooling operation. Using the defined hyperparameter space, we deployed a two-stage systematic model selection.
In the first stage, we used the validation set within each dataset to estimate the optimal set of algorithmic hyperparameters within the defined hyperparameter space 1 Given fs = 80 Hz sampling rate, the temporal window of t = 100 ms would be covered by a kernel width of w K = 8, and t = 300 ms, t = 500 ms are covered by kernel widths of w K = 24, w K = 40, respectively (here, the epoch size only), given all structural hyperparameters fixed. In this regard, for each dataset and each combination of structural hyperparameters, we trained a ConvNet and recorded its accuracy at the epoch size, which led to the highest accuracy of the validation set within that dataset. In the second stage, the optimal combination of structural hyperparameters was estimated as the one that led to the highest average accuracy across all datasets used for model selection. In order to maximize the number of subjects that were used in the model selection stage, we used three datasets with the highest number of subjects for model selection (Physionet, Weibo2014, and BCI-DataSet2A) and set aside BCI-DataSet2B for testing-BCI-DataSet2A and BCI-DataSet2B have the same number of subjects and the assignment was done arbitrarily. All this process was conducted using a Linux workstation with Intel Core i9-9900K, (3.6 GHz) processor, 32 GB of RAM, and Nvidia GeForce RTX 2080 Ti (RAM = 11GB, CUDA Cores: 4352). The entire workflow was implemented in Pytorch deep learning environment.

V. RESULTS
In this section, we present the results of the systematic model selection described in Section IV. Table 4 shows the classification accuracies achieved at the epoch that led to the highest accuracy on the validation set within each dataset used for model selection (Physionet, Weibo2014, and BCI-DataSet2A)-this was the first stage of model selection as described in Section IV. In this table, C[c Y,1 , . . . , c Y,L ] and K(h K × w K ) are used to represent the number of output channels in each layer and the rectangular shape of the kernels that was in common across all layers, respectively. The rightmost column in this table is used to estimate the optimal combination of structural hyperparameters within the defined hyperparameter space (i.e., the second stage of the model selection). The ConvNet structure that led to the highest average accuracy across all three datasets used in model selection is identified in bold in the first column. Hereafter, we refer to this structure (C[128, 64, 32, 16, 8]_K(3 × 8)) as ConvNet opt . Fig. 1 shows a schematic representation of ConvNet opt architecture.
Having estimated the structural hyperparameters, we now compare the performance of ConvNet opt with the EEGNet on a set of subjects within a dataset that was entirely unseen during the model selection (i.e., BCI-DataSet2B). Note that the aforementioned model selection led to a fixed set of structural hyperparameters (i.e., the structure of ConvNet opt ), but the algorithmic hyperparameter (here, the epoch size) is left undetermined. Although using a similar strategy we could have estimated all hyperparameters including the epoch size, we left it to retraining process of ConvNet opt on each new data to estimate the optimal epoch size (similar to EEGNet in which we only use the structure, and for each new dataset, the weights and the epoch size are estimated). As a result, to compare the performance of ConvNet opt and EEGNet on BCI-DataSet2B, we first randomly split the observations for each subject to training (70%), validation (15%), and test (15%) sets and use the validation set to estimate the optimal epoch size for each model. Tables 5 and 6 show the subject-specific classification accuracy achieved using the EEGNet and the ConvNet opt on training, validation, and test sets for each subject within BCI-DataSet2B. Of particular importance in these tables is the classification accuracies achieved on the test data for each subject. Fig. 2 shows a subject-specific comparison between these accuracies. As seen in this figure, there is not a VOLUME 8, 2020 significant difference between the classification accuracies achieved by ConvNet opt and EEGNet-a two-sided Wilcoxon rank-sum test does not reject the null hypothesis of having a difference between these two sets of accuracies (p = 0.51). In other words, the classification accuracies achieved by ConvNet opt and EEGNet are comparable on BCI-DataSet2B. Fig. 3 shows the total training time of the selected ConvNet opt architecture on all four datasets. The longer training time was naturally associated with the larger datasets (see Table 3). For instance, the BCI-DataSet2B took 5352 seconds, which is the longest duration among other datasets, elapsed to complete the training of the model.

VI. DISCUSSION
Deep learning technologies have offered unprecedented opportunities to construct remarkably accurate classifiers by integrating the process of feature extraction into the classifier training. However, this integration process comes at the price of tuning a large number of (algorithmic and structural) hyperparameters. This has partially led many studies to rely on existing well-known architectures such as AlexNet or ResNet, use the domain knowledge to construct the final architecture, or have an unclear ad hoc strategy deployed for model selection. This raises the question of whether training accurate deep learning models using a principled model selection is possible, or, alternatively, experiencing and developing high intuition regarding the collected BCI data is the most prudent way to go about tuning the hyperparameters.
To address this question, in this work, we sought to show the efficacy of using standard convolutional neural networks within a systematic model selection for the classification FIGURE 2. Classification accuracy (%) of EEGNet and ConvNet opt achieved on test data that was set aside for each subject within BCI-DataSet2B. The left pane shows the boxplot for the model performances on nine subjects where each star represents the subject-specific accuracy. The right pane shows pairwise comparison between EEGNet and ConvNet opt for each subject as well as the average accuracy obtained across all subjects.  of sensorimotor rhythms that arise during motor imagery tasks. In this regard, we used a two-stage systematic model selection. In the first stage, the validation set within each dataset was used to estimate the optimal set of algorithmic hyperparameters within the defined hyperparameter space given all structural hyperparameters fixed. The validation set used in this stage was a portion of observations randomly set aside from the pooled sample of subjects within each dataset-pooled design per se) was a feasible solution for reducing the extensive calibration time of BCI systems for individual subjects [54]. In the second stage, the optimal combination of structural hyperparameters was estimated as the one that led to the highest average accuracy across all datasets used for model selection.
The aforementioned model selection led to a fixed set of structural hyperparameters. However, the only variable algorithmic hyperparameter (i.e., the epoch size) was left free and is estimated in the training process of the selected architecture on new data (similar to training the EEGNet architecture, which was used as the benchmark for comparison)-training the EEGNet and the ConvNet opt architectures for nine additional subjects from BCI-DataSet2B that was entirely unseen during the model selection led to a set of (statistically) comparable accuracies with EEGNet. Moreover, ConvNet opt is invariant on the spatial dimensionality of the input data and is able to capture discriminative spatial features for accurate classification. Going back to the main question of this study, this observation shows the possibility of using standard CNNs within a systematic brute-force model selection to achieve comparable classification accuracy as the state-of-the-art deep learning architectures used in the classification of motor imagery tasks.
In other words, our study compares the use of prior knowledge versus data in the context of model selection. Naturally, if ''good'' prior knowledge about the nature of data and a mechanism for encoding this knowledge into the structure of a classification rule is available, we may expect training highly accurate predictive models; however, in the absence of such prior knowledge or encoding mechanism, we may look into conducting a data-driven brute-force model selection as a viable option. Nevertheless, the performance of the selected structure, which is the outcome of this brute-force model selection, depends on the pre-specified hyperparameters space. Here, we showed that in so far as classification of sensorimotor rhythms is concerned, a pre-specified space that was restricted by our computing power and defined based on common values of hyperparameters can lead to accurate classifiers.
The systematic model selection strategy used here is not, of course, the only way to go about estimating the hyperparameters; for example, various other performance metrics or network pruning techniques could be used. Depending on computing power capacity, one may even define a much larger hyperparameter space or may use some suboptimal search strategies (as opposed to the exhaustive search used here). Furthermore, one may estimate both the algorithmic and structural hyperparameters in model selection. Nonetheless, we believe that the results obtained here would suffice to verify the efficacy of conducting a brute-force CNN model selection within a limited hyperparameter space.

VII. CONCLUSION
In this work, we examined whether a brute-force search with a limited space of hyperparameters for standard convolutional neural networks (CNNs) would possibly lead to a comparable classification accuracy as the state-of-the-art deep learning architectures for classification of motor imagery tasks. In this regard, we conducted the first large-scale analysis in which we use EEG data collected from 128 subjects across three different datasets to determine the single architecture that achieved the highest average accuracy on pooled-sample validation sets across these datasets. Retraining the identified architecture on a set of nine additional subjects within another dataset that was entirely unseen during the model selection stage led to a comparable set of classification accuracies as the EEGNet. This result of our investigation does not undermine the capacity and efficacy of the EEGNet, which was used as the benchmark for comparison. EEGNet has shown the state-of-the-art performance in various EEG classification tasks across various BCI paradigms; however, its structure is partially inspired by the domain knowledge, which is, of course, not possessed by non-experts. The findings of current study on systematic CNN model selection may serve as guide to the deep learning practitioners with minimal domain knowledge to propose robust BCI models in the classification of motor imagery tasks.