Multidomain Convolution Neural Network Models for Improved Event-Related Potential Classification

Two convolution neural network (CNN) models are introduced to accurately classify event-related potentials (ERPs) by fusing frequency, time, and spatial domain information acquired from the continuous wavelet transform (CWT) of the ERPs recorded from multiple spatially distributed channels. The multidomain models fuse the multichannel Z-scalograms and the V-scalograms, which are generated from the standard CWT scalogram by zeroing-out and by discarding the inaccurate artifact coefficients that are outside the cone of influence (COI), respectively. In the first multidomain model, the input to the CNN is generated by fusing the Z-scalograms of the multichannel ERPs into a frequency-time-spatial cuboid. The input to the CNN in the second multidomain model is formed by fusing the frequency-time vectors of the V-scalograms of the multichannel ERPs into a frequency-time-spatial matrix. Experiments are designed to demonstrate (a) customized classification of ERPs, where the multidomain models are trained and tested with the ERPs of individual subjects for brain-computer interface (BCI)-type applications, and (b) group-based ERP classification, where the models are trained on the ERPs from a group of subjects and tested on single subjects not included in the training set for applications such as brain disorder classification. Results show that both multidomain models yield high classification accuracies for single trials and small-average ERPs with a small subset of top-ranked channels, and the multidomain fusion models consistently outperform the best unichannel classifiers.


Introduction
The accurate classification of event-related potentials (ERPs) is crucial in brain activity research and in brain-related clinical studies, evaluations, and diagnostics. Classification of ERPs can be improved by fusing frequency, time, and spatial domain information acquired from the scalograms of the continuous wavelet transforms (CWTs) of multichannel recordings of these signals. The frequency-time information is captured in the wavelet scalograms of each channel's ERPs, and the spatial information is contributed from the spatially distributed channels across the scalp. Convolution neural network (CNN) classifiers are ideal for inputs presented in matrix and cuboid formats [1][2][3][4][5][6][7][8] and are, therefore, a good choice for processing single-channel matrix scalograms and multichannel scalograms combined into a cuboid. Furthermore, unlike traditional classifiers, which typically require extracting "hand-engineered" features from the three domains and selecting a set of features through trial and error, CNNs can be trained to extract intertwined frequency-time-spatial features directly. The shapes and dimensions of the filters in the convolution layers as well as the number of convolution layers offer flexibility in the type of multidomain intertwining desired. Classification models designed for single subjects, which is the primary focus of is the primary focus of this study, can take two forms: customized classifier des brain-computer interface (BCI)-type applications and group-based design for clas brain activity and disorders in applications related to clinical studies, evaluation diagnostics. Customized classifiers are designed using the individual's own whereas group-based classifiers are typically trained on ERPs from a group of s having a similar disorder but tested on the ERPs of an individual not included in th ing group.
In a recent study related to the classification of unichannel (single channel) E single subjects [1], we introduced the Z-scalogram and the V-scalogram derived fr cone of influence (COI) of the standard scalogram of the CWT. The COI is a bounda is superimposed on the wavelet scalogram to delineate the coefficients that are ac from those that are inaccurate due to edge effects [1,[9][10][11][12].
The standard S-scalogram is obtained through "same" convolutions, the Z-sca is obtained from the S-scalogram by zeroing-out the inaccurate coefficients outs COI, and the V-scalogram is obtained by completely cropping out the inaccurate cients outside the COI. The scalogram containing only the accurate coefficients sponds to the scalogram that would be obtained through "valid" convolutions; the it is referred to as the V-scalogram. Details of the various types of convolutions in in defining the COI can be found in [1]. Figure 1, extracted from [1], shows the S-, Z V-scalograms of an ERP. The S-scalogram shown in Figure 1b is the standard Morlet let transform scalogram. In the study, feature vectors derived from the three scalo were used to design support vector machines (SVMs), random forests (RFs), k-n neighbor (k-NN), multilayer perceptron neural networks (MLPs), and CNNs to c synonymous and non-synonymous ERPs. It was shown that the Z-scalogram cla outperformed the S-scalogram classifiers, and the V-scalogram classifiers outperf the S-and Z-scalogram classifiers. It was also shown that the CNN classifiers yield best performance. However, the study focused on customized classifier design for subjects, did not involve group-based classification, and involved only unichannel fication with no attempt to fuse information from multiple channels. In another recent study related to the classification of multiaxial human mov various CNN models were developed to fuse "information" from multiple mu In another recent study related to the classification of multiaxial human movement, various CNN models were developed to fuse "information" from multiple multiaxial sensors at the input-level (early fusion) and the output-level (late fusion) [13]. The information in input-level fusion is the input data or features extracted from the input data, whereas information in output-level fusion is the decisions of multiple classifiers or some measures at the outputs of the multiple classifiers [13][14][15][16][17][18][19]. In the multiaxial human movement classification study, it was shown that the fusion classifiers outperformed the uniaxial classifiers. However, the multiaxial temporal signals were used directly without any form of time-frequency transformations. Furthermore, the movement signals of all individuals were mixed prior to generating the training and test sets. That is, the experiments did not involve designing customized classifiers for each individual, nor did they involve designing group-based classifiers to test the movement signals of individuals not included in the training set.
The present study builds upon these studies and improves the classification of ERPs by using CNNs which fuse frequency, time, and spatial information acquired from the COImodified scalograms of the multichannel ERPs. Two CNN-based multidomain classification models which fuse the COI-modified scalograms of multiple channels are introduced. In the first model, referred to as the Z-CuboidNet, the input to the CNN is a frequency-timespatial cuboid formed by fusing the Z-scalograms of the multichannel ERPs, and in the second model, referred to as the V-MatrixNet, the CNN input is a frequency-time-spatial matrix formed by fusing the V-scalogram frequency-time vectors of the multichannel ERPs. CNNs are selected for the classifier component of the multidomain models because they are ideal for inputs presented in matrix and cuboid formats. Additionally, this study focuses on the design (training and testing) of both customized and group-based ERP classifiers. To the best of our knowledge, we are not aware of any other study that has been reported on the development of multidomain classification models similar to the Z-CuboidNet and V-MatrixNet.

Materials and Methods
This section describes the following: (a) subsample averaging to facilitate the design of ERP classifiers, (b) formulations of the Z-CuboidNet and V-MatrixNet multidomain classifier models, (c) the formulations of the unichannel Z-MatrixNet and V-VectorNet models which are special cases of the multidomain models, (d) the ERP data set used to demonstrate the application and the performance of the classifier models, (e) crossvalidation for group-based classifier design, (f) selection of subsets of channels for single subjects and the subjects combined into a group, (g) the Morlet wavelet transform, which, is the CWT used to generate the scalograms, and (h) the architectures, hyperparameters, and training options selected for the CNN models.

Subsample Averaging
An important issue related to ERP classification is the poor signal-to-noise ratio (SNR) of single trials; therefore, this issue is discussed before describing the formulations of the multidomain classification models. The SNR is improved for analysis and classification by averaging single trials from several repetitions of stimulus presentation [20][21][22]. The m-Subsample Averaging algorithm [1], which generates small-sample ERPs by repeated averaging of a small number of single trials drawn without replacement, can be employed to improve the performance over single trials. An ERP formed by averaging m single trials is referred to as an m-ERP. Subsample averaging enables the generation of a large ensemble of m-ERPs to facilitate classifier training. For consistency, single trials are referred to as 1-ERPs. The generalized formulations of the multidomain models in the following subsection assume m-ERPs. The experiments in this study are designed to evaluate the performances of the multidomain classifiers for m = 1, 2, and 4.

Multidomain Classifier Models
The most general formulations of the Z-CuboidNet and V-MatrixNet multidomain models involving multiple channels and polychotomous ERP classes are presented in this section. The channels are represented by d, d = 0, 1, . . . , (D − 1), where, D is the number of selected channels, and the polychotomous classes are represented by ω j , j = 1, 2, . . . , Ω, where Ω is the number of classes. Using array (row-column) representations and zerobased indexing, a scalogram of an m-ERP is denoted by G( f , t); f = 0, 1, ..., (F − 1), t = 0, 1, . . . , (T − 1), where F is the number of frequency bands and T is the duration of the ERP. Furthermore, if an m-ERP in the ensemble of each channel is indexed by q, q = 1, 2, . . . , Q, where Q is the number of m-ERPs, the S-, Z-, and V-scalograms of the qth m-ERP of channel d belonging to class ω j are represented by G respectively. Classification involving unichannel and dichotomous classes are special cases of the general formulations.
Without loss of generality, the CNNs in the multidomain models consist of two convolution layers followed by a max pooling layer and a fully connected network (FCN). The activation functions in the convolution layers are ReLU and are tanh in the intermediate layers of the FCN. The output of the FCN is a softmax layer which has one output for each ERP class, and each output is interpreted as the posterior probability of the input class. The CNNs are trained with the gradient descent backpropagation algorithm using the crossentropy loss function. The formulations of the 4 models, which follow next, describe the operations in the first convolution layer of each model in detail because the key differences between the models occur in this layer. The remaining operations are described briefly to point out additional differences and for the sake of completeness. For convenience, it will be assumed that all convolutions are "same convolutions" through zero-padding so that the filtered outputs have the same dimensions as the input to the convolution layer. The equations describe the cross-correlations of the input and filters in the convolution layers because the shifting, multiplication, and summing operations performed in CNNs are cross-correlation and not convolution.
The block diagram of the Z-CuboidNet is shown in Figure 2. In the first convolution layer, the cuboid input G . . , 0, . . . , α; t = −β, . . . 0, . . . , β; d = 0, 1, . . . , (D − 1); n = 1, 2, . . . , N 1 , and the output of the n th filter in the first convolution layer is given by Z,j ( f + r, t + u, d), f = 0, 1, . . . , (F − 1), t = 0, 1, . . . , (T − 1). It is important to note that cuboid filters are selected to extract intertwined frequency, time, and spatial features. Additionally interesting to note is that the result of convolving two cuboids is a matrix. After a bias is added to the filtered outputs and passed through the nonlinear ReLU activation function, the filtered outputs are combined into a ( × × ) cuboid feature map. The convolutions in the second layer are also selected to be cuboid convolutions to extract more complex cross-intertwined features across the cuboid feature map. The resulting matrices are combined into a cuboid after adding biases and passing through ReLU activations. The height and width of the cuboid is shrunk by the following max pooling layer. The max-pooled cuboid is flattened and presented as the input to the FCN. The output of the FCN is represented by the posterior class probability vector = ( , , . . , ), and the test input is assigned to a class using the maximum response rule. That is, the test input is assigned to class if for all .

Z-MatrixNet
The Z-MatrixNet is a special case of the Z-CuboidNet for unichannel classification. The input cuboid reduces to a frequency-time matrix when = 1. A classifier is developed independently for the -ERPs of each channel. This approach can be used to compare the performances of each channel classifier and to select the best unichannel classifier. Most importantly, the performances of the multidomain Z-CuboidNet classifier can be compared against the best unichannel Z-MatrixNet classifier. The input of channel to the CNN is simply the Z-scalogram , ( ) ( , ). In the first layer, the input matrix is The matrix filters extract features that are coupled across frequency and time. Note that convolving two matrices results in a matrix. The filtered outputs are combined into a ( × × ) cuboid feature map after the biases are added and passed through the ReLU activation function. Cuboid convolution filters are selected in the second layer to extract more complex cross-coupled features across the cuboid feature map. The cuboid convolutions result in matrices which are combined into cuboids after adding biases and passing through ReLU activation. The cuboid is passed through the max pooling layer, flattened, and passed through the FCN. The output of the FCN of the unichannel classifier It is important to note that cuboid filters are selected to extract intertwined frequency, time, and spatial features. Additionally interesting to note is that the result of convolving two cuboids is a matrix. After a bias is added to the filtered outputs and passed through the nonlinear ReLU activation function, the N 1 filtered outputs are combined into a (F × T × N 1 ) cuboid feature map. The convolutions in the second layer are also selected to be cuboid convolutions to extract more complex cross-intertwined features across the cuboid feature map. The resulting matrices are combined into a cuboid after adding biases and passing through ReLU activations. The height and width of the cuboid is shrunk by the following max pooling layer. The max-pooled cuboid is flattened and presented as the input to the FCN. The output of the FCN is represented by the posterior class probability vector P = (P ω 1 , P ω 2 , .., P ω Ω ), and the test input is assigned to a class using the maximum response rule. That is, the test input is assigned to class ω i if

Z-MatrixNet
The Z-MatrixNet is a special case of the Z-CuboidNet for unichannel classification. The input cuboid reduces to a frequency-time matrix when D = 1. A classifier is developed independently for the m-ERPs of each channel. This approach can be used to compare the performances of each channel classifier and to select the best unichannel classifier. Most importantly, the performances of the multidomain Z-CuboidNet classifier can be compared against the best unichannel Z-MatrixNet classifier. The input of channel d to the CNN is In the first layer, the input matrix is convolved with The output of filter n is given by the matrix convolution, The matrix filters extract features that are coupled across frequency and time. Note that convolving two matrices results in a matrix. The N 1 filtered outputs are combined into a (F × T × N 1 ) cuboid feature map after the biases are added and passed through the ReLU activation function. Cuboid convolution filters are selected in the second layer to extract more complex cross-coupled features across the cuboid feature map. The cuboid convolutions result in matrices which are combined into cuboids after adding biases and passing through ReLU activation. The cuboid is passed through the max pooling layer, flattened, and passed through the FCN. The output of the FCN of the unichannel classifier of channel d is the posterior class probability vector P d = P d,ω 1 , P d,ω 2 , .., P d,ω Ω , and the test input is assigned to class ω i if

V-MatrixNet
The V-scalogram was introduced to overcome the inclusion of zeroed-out coefficients in the Z-scalogram. However, the resulting non-rectangular matrix presents problems for CNN classifier development because the inputs to each convolution and pooling layer are expected to be a rectangular matrix or a rectangular cuboid. The approach developed in [1] circumvents this problem by concatenating the rows of the V-scalogram, which contain only the accurately computed coefficients inside the COI, into a feature vector. The dimension of the resulting frequency-time feature vector is given by where ∇ represents the row concatenation operation, T is the transpose operation, and G The subscript j indicates that the V-scalograms are generated from the class ω j m-ERPs.
The V-MatrixNet, illustrated in Figure 3, fuses the V-scalogram frequency-time feature vectors of the D channels into the columns of an N V × D input matrix which is given by where ∆ in this case is the operator for fusing column vectors into a matrix and f t is the column index. The index f t indicates that each element of the feature vector is a frequency-time element of the V-scalogram G

V-MatrixNet
The V-scalogram was introduced to overcome the inclusion of zeroed-out coefficients in the Z-scalogram. However, the resulting non-rectangular matrix presents problems for CNN classifier development because the inputs to each convolution and pooling layer are expected to be a rectangular matrix or a rectangular cuboid. The approach developed in [1] circumvents this problem by concatenating the rows of the V-scalogram, which contain only the accurately computed coefficients inside the COI, into a feature vector. The dimension of the resulting frequency-time feature vector is given by = ∑ where is the duration of the frequency band inside the COI. The feature vector extracted from the V-scalogram , ( ) ( , ) of channel is given by where ∇ represents the row concatenation operation, is the transpose operation, and where ∆ in this case is the operator for fusing column vectors into a matrix and is the column index. The index indicates that each element of the feature vector is a frequency-time element of the V-scalogram , ( ) ( , ).  The matrix G . . , 0, . . . , β, n = 1, 2, . . . , N 1 ; the output of filter n is given by the matrix convolution A bias is added to the filtered output and passed through the ReLU activation function to generate the matrix feature map. The N 1 filtered outputs are combined into a (N v × D × N 1 ) feature cuboid. The feature cuboid is convolved with cuboid filters, and the resulting feature maps are combined into a more complex feature cuboid which is passed through the pooling layer, flattened, and passed through the FCN. The softmax layer computes the posterior class probabilities, which are stored in a probability vector, and the input is assigned to the class given by the rule in Equation (3).

V-VectorNet
The V-MatrixNet reduces to the V-VectorNet for unichannel classification. In order to determine the best channel classifier and to compare the performance against the multichannel V-MatrixNet fusion classifier, a V-VectorNet classifier is developed independently for the m-ERPs of each channel. In the first convolution layer, the column feature vector G . . , 0, . . . , α; n = 1, 2, . . . , N 1 , and the column output of the n th filter is given by A bias is added to the filtered output, which is a column vector, and passed through the ReLU activation function to yield the column feature map. The N 1 column feature maps are combined into a (N V × N 1 ) matrix that goes through the second convolution filter consisting of matrix filters. The matrix outputs are combined into a cuboid and passed through the max pooling layer. The output of the max pooled layer is flattened and fed into the FCN, which outputs the posterior class probability vector. The rule in Equation (5) is used to assign the input to a class.
In summary, the outputs of the convolution layers take on different shapes depending on the shapes of the inputs and the choice of convolution features. The desired fusion of the frequency, time, and spatial domains is controlled by the shapes and sizes of the filters. More complex discriminatory features can be extracted by making the networks deeper; that is, by increasing the number of network layers.

ERP Data Set
The EEG/ERP data used in this study to demonstrate the application and evaluate the performance of the multidomain models was the same as used in [1]. The data was downloaded from: https://eeglab.org/tutorials/10_Group_analysis/study_creation.html# description-of-the-5-subject-experiment-tutorial-data (accessed on 1 October 2022).
This data set was selected because it is compact and serves the purpose of demonstrating the application of the multidomain models for both single-subject-customized and group-based classification. Furthermore, given that the multidomain CNN models are extensions of the unichannel CNN models in [1], the performance trends can be compared to determine consistencies and/or discrepancies between the two studies. The design of the multidomain models, however, is not restricted to this particular data set or to any other data set. Details of the data relevant to this study are as follows: Task: Auditory binary semantic task. Subjects distinguished between synonymous and non-synonymous word pairs. Number of ERP classes: Two (synonymous, non-synonymous). Complete details of the EEG data can be found on the referenced website, and the details of the m-ERPs extracted from the EEG can be found in [1].

Group-Based Cross-Validation
The goal of group-based cross-validation is to train the multidomain classifiers with the m-ERPs of a set of subjects and test the m-ERPs of the individual subjects not included in the training set. In order to do so in a systematic fashion, k-fold cross-validation is modified so that the folds are defined with respect to subjects. In this cross-validation approach, which we refer to as "k-subject-fold cross-validation," each fold consists of the m-ERPs of (B/k) subjects, where B and k are the number of subjects and folds, respectively. The classifier is trained with the m-ERPs in (k−1) folds and validated (tested) on the m-ERPs of each subject in the left-out fold. As in regular k-fold cross-validation, the process is repeated k times so that the ERPs of all subjects are tested. The final result is obtained by averaging the classification accuracies within and across the k repetitions. The process can be repeated several times and averaged by first shuffling the order of the subjects so that the subjects fall in different folds. For the special case (k = B), that is, each fold contains the m-ERPs of only one subject, the procedure reduces to leave-one-subject-out cross-validation.

Channel Selection
Although fusing ERPs from multiple channels is likely to improve performance, designing multichannel ERP fusion classifiers is a challenging problem because the following issues have to be addressed: (a) Should all channels be used in the design? The answer is generally no because including channels that elicit ERPs that do not carry useful discriminatory information is equivalent to adding noise, and the additional dimensions lead to overfitting, which has a negative impact on performance. (b) If all channels are not used, how should a subset of channels be selected? Temporospatial PCA is one method that can be used to identify channels and time windows that capture effects elicited by stimuli [23,24]. Alternatively, channel selection algorithms can be applied to select a subset of useful channels [1,[25][26][27]. In general, the answers to the questions raised above are not straightforward and are often application dependent. The generalized rank-of-rank sum channel selection strategy described in [1] was used to select the top 12 channels for each subject and across all 5 subjects. The ranking for the 5 subjects pooled into a group is obtained by summing the single subject ranks and ranking the rank-sums. The selected sets of channels are listed in Table 1, in which rank 1 is the best channel and rank 12 is the 12th best channel. Observe that the channel rankings vary across the 5 subjects.

Morlet Wavelet Transform
The analytic Morlet CWT [1,28,29] is selected in this study because it is a good choice for analyzing the oscillatory behavior of ERPs and EEGs. The analytic Morlet mother wavelet, a product of a complex exponential signal of frequency f 0 and a zero-mean Gaussian window with variance σ 2 , is given by where, the constant A ensures that the wavelet energy is equal to one. The analytic wavelet coefficients are complex, and the scalogram, represented by G ( f , t), is a plot of the CWT amplitude |G( f , t)| or power |G( f , t)| 2 as a function of discrete frequency and discrete time.
Scalograms of other CWTs, such as Morse [30] and Bump [31], can also be used for the development of the Z-CuboidNet and V-MatrixNet models if they are more suited for a given application.

CNN Architectures and Hyperparameters
The reason for selecting CNNs was explained in the Introduction. For consistency, the CNNs used in all 4 classification models had 2 convolution layers followed by a max pooling layer and an FCN. This section describes the shape, dimensions, and number of filters in the convolution layers; dimensions and strides of the max pooling filters; the number of layers in the FCN and the activations in the FCN; and the training hyperparameters.
The scalogram dimensions of each m-ERP were ( f = 108) × (t = 200). The top D out of the 64 channels were selected; therefore, the dimensions of the input to the Z-CuboidNet models were 108 × 200 × D. The input to Z-MatrixNet had dimensions 108 × 200. The feature vector extracted from the V-scalogram was a (N V = 17, 056)-dimensional column vector (see Section 2.3); therefore, the input dimension for V-MatrixNet was 17, 056 × D. The input to the V-VectorNet was a 17, 056-dimensional column vector. The number of filters in the first and second convolution layers are denoted by N 1 and N 2 , respectively. The FCN had 3 layers of neurons. The architectures, hyperparameters, and training options of the 4 classification models are summarized in Table 2. Table 2. Architectures, hyperparameters, and training options of the 4 classification models.

Experiments and Results
This section describes the extensive set of experiments designed to evaluate the performances of the four classification models using the binary ERP data described in Section 2.3. The experimental hyperparameters were set as follows: (a) The top D = 4, 8, and 12 ranked channels listed in Table 1 were selected to implement the multidomain classifiers in order to demonstrate the improvements that can be expected by increasing the number of channels. The unichannel classifiers were also implemented for the top 12 channels. The unichannel classifier that gave the best result, denoted by D = 1, was used for comparisons against the multidomain classifiers. The number of m-ERPs tested for each subject was (195 m-ERPs/class) (2 classes) (50 runs) = 19,500. The classification accuracy, expressed as a percentage, was estimated as the number of correctly classified m-ERPs divided by 19,500. For group-based classification, the classification accuracy was estimated in the same manner because leave-one-subject-out cross-validation was used. That is, the number of m-ERPs in the cross-validation folds was the same for both cases. The classifiers were implemented using the PyTorch library.

Customized Classification Experiments
This set of experiments involved the classification of the ERPs of each individual subject using only the ERPs collected from the individual. As noted in the Introduction, the need for this approach using single trials can typically arise in the design of systems such as customized BCIs for individuals [32][33][34][35]. In order to demonstrate the improvements that can be expected by increasing the averaging parameter m, the experiments were also conducted with 2-ERPs and 4-ERPs.

Customized Unichannel Experiments
For each subject, the Z-MatrixNet and V-VectorNet unichannel classifiers were implemented for their 12 top-ranked channels. The complete set of results is presented in Tables A1-A3 in Appendix A for m taking values 1, 2, and 4. The channels labeled 1-12 in the first column of the three tables are the ranked channels listed in Table 1. For example, the results for the channel labeled CH = 1 contain the classification accuracies of the top-ranked (rank 1) channels P3, PO4, O2, O2, and C2 of Subjects B 1 , B 2 , B 3 , B 4 , and B 5 , respectively.

Customized Multichannel Experiments
The Z-CuboidNet and V-MatrixNet models were implemented for the D = 4, 8, and 12 top-ranked channels listed in Table 1. The classification accuracies for each D were determined for m = 1, 2, and 4. The results are presented in Table 3. Each result in the table is the average of testing 19,500 m-ERPs. Note that m = 2 and 4 are not included for D = 12 because the multidomain models gave 100% for m = 1 across all five subjects; therefore, there was no need to increase m beyond unity. The table also contains results for D = 1, which is the best unichannel result for each subject extracted from Tables A1-A3 in Appendix A. That is, the results for D = 1 are the best Z-MatrixNet and V-VectorNet results. In order to facilitate analyses of the performance trends, the average classification accuracies of the customized classifiers of the five subjects are presented in Figure 4 as functions of D and m. That is, each bar is the average across the five customized classifiers for a given combination of D and m. results. In order to facilitate analyses of the performance trends, the average classification accuracies of the customized classifiers of the five subjects are presented in Figure 4 as functions of and . That is, each bar is the average across the five customized classifiers for a given combination of and .

Group-Based Classification Experiments
The group-based classification experiments are different from the customized set of experiments in the previous section because these experiments involve classifying the -ERPs of each subject using the models trained with the -ERPs of the other subjects in the group. In this set of experiments, the leave-one-subject-out method was used to design the models. For example, the models were trained with the -ERPs of subjects , , , and and were tested on the -ERPs of subject . The results are summarized in Table 4 and are interpreted in the same manner as the results in Table 3. The 12 unichannel results are presented in Tables A4-A6 in Appendix A. Unlike the customized case where the channels of each subject were ranked independently, the 12 top channels of the group-

Group-Based Classification Experiments
The group-based classification experiments are different from the customized set of experiments in the previous section because these experiments involve classifying the m-ERPs of each subject using the models trained with the m-ERPs of the other subjects in the group. In this set of experiments, the leave-one-subject-out method was used to design the models. For example, the models were trained with the m-ERPs of subjects B 2 , B 3 , B 4 , and B 5 and were tested on the m-ERPs of subject B 1 . The results are summarized in Table 4 and are interpreted in the same manner as the results in Table 3. The 12 unichannel results are presented in Tables A4-A6 in Appendix A. Unlike the customized case where the channels of each subject were ranked independently, the 12 top channels of the group-based classifiers have a common ranking. The average classification accuracies of the group-based classifiers of the five subjects are presented in Figure 5. based classifiers have a common ranking. The average classification accuracies of the group-based classifiers of the five subjects are presented in Figure 5. Table 4. Leave-one-subject-out classification accuracies.

Discussion of Results
The four classification models can be ranked according to the classification accuracies in Tables 3 and 4. The results in: Table 3 show that only marginal differences exist between the performances of the customized Z-CuboidNet and the V-MatrixNet classifiers across all five subjects, all values of , and all values of . The Z-CuboidNet and V-MatrixNet multichannel fusion models outperform the best unichannel ( = 1) Z-MatrixNet and V-VectorNet models, respectively. For a given , the performance improves by increasing , and for a given , the performance improves by increasing .
Tables A1-A3 show that there is no single channel that is best across the five subjects.

Discussion of Results
The four classification models can be ranked according to the classification accuracies in Tables 3 and 4 Tables A1-A3 show that there is no single channel that is best across the five subjects. (c) Table 4 show that the performance trends of the group-based classifiers are similar to those of the customized classifiers in Table 3. That is, the Z-CuboidNet and V-MatrixNet performances differ marginally, the multichannel multidomain models outperform the best unichannel models, and the performance improves when m and D are increased. (d) Tables A4-A6 show that there is no single channel that is best across the five subjects. (e) Table 4 show that the accuracies of the group-based classifiers are slightly lower than those of the corresponding customized classifiers (Table 3) for small values of D and m. The drop in accuracies can be attributed to the fact that none of the m-ERPs of a test subject are included in the group training set.
Both customized multidomain models yielded single trial classification accuracies exceeding 90% for eight channels across all five subjects. Classification accuracies of 100% were obtained for all subjects when 12 channels were used. The group-based Z-CuboidNet models yielded single trial classification accuracies exceeding 90% for eight channels and yielded 100% for twelve channels across all five subjects. The single trial (m = 1) average classification accuracy of the five subjects improved from 81.06 to 100% when the number of channels (D) was increased from one to twelve. This improvement is dramatic. Moreover, the ability to obtain higher accuracies by increasing the number of channels is especially noteworthy because it is a clear indication that the ERPs of the spatially distributed channels carry complementary information that can be exploited to improve performance.
There is no clear winner between the Z-CuboidNet and V-MatrixNet models for both customized and group-based classification. The zeroed-out region in the input cuboid of the Z-CuboidNet is common to all ERP classes; therefore, it is likely to be ignored by the convolution filters in the process of extracting discriminatory features during training. The Z-CuboidNet offers the most direct way of fusing the scalograms. In the previous study involving unichannel customized m-ERP classification, the results for the CNN implementations of the classifiers were mixed with no clear winner between the V-scalogram and Z-scalogram classifiers. The results of the extensions of the unichannel models to multichannel models are, therefore, in agreement with the previous study. It was also established in [1] that the V-and Z-scalogram CNN classifiers outperform the V-and Z-scalogram implementations of the other classifiers (SVM, RF, k-NN, and MLP). Undoubtedly, the Z-CuboidNet and V-MatrixNet would outperform the multichannel implementations of these other classifiers. Furthermore, the other classifiers do not offer any elegant manner to fuse the V-and Z-scalograms because they require vector inputs. Vector inputs can be generated by concatenating the rows of the scalogram of each channel into a unichannel vector, followed by the concatenation of the unichannel vectors of the multiple channels into a super-sized multichannel vector. The resulting multichannel vector input will have spatial components separated by the length of the unichannel vector, which makes it difficult to design convolution filters to locally couple the spatial information.
Finally, it is important to note that high classification accuracies were obtained with no attempts to optimize the performance of the multidomain classifiers. In general, the multidomain classifiers have the potential for further improvements in performance by increasing the number of channels, increasing the network depth, tweaking the hyperparameters, and, when applicable, by increasing the averaging parameter m. Increasing the network depth, however, requires an increase in the training set size which could be a limitation in practical applications in which collecting large ERP ensembles is problematic. Additionally noteworthy is that although the primary focus was on ERP classification, the generalized formulations of the two multidomain models make them easily adaptable to other problems involving multisensor signal classification.

Conclusions
The Z-CuboidNet and V-MatrixNet multidomain models were introduced to improve ERP classification by fusing frequency, time, and spatial information from multichannel ERP recordings. Both CNN-based models do not require expert extraction of hand-engineered features for the input, which not only facilitates classifier design but also avoids the problems associated with selecting feature sets through trial and error. An extensive set of experiments involving customized and group-based classification of ERPs were conducted to demonstrate the application and performance of the multidomain and unichannel classifiers. The results clearly showed that the multidomain classifiers consistently outperformed the unichannel classifiers. Additionally, high classification accuracies were obtained from the multidomain models trained and tested on the single trials of single subjects, which is an especially important contribution for the design of customized classifiers for BCI-type applications in which the interface is expected to be controlled by a single stimulus presentation. It was also shown that the multidomain models trained on the ERPs from a group of subjects can accurately classify ERPs of single subjects not included in the training group, which is a notable contribution for many applications related to classifying brain activity and brain disorders. The multidomain models are highly scalable to large subject groups and to more ERP classes which are crucial requirements for practical deployment. Future efforts will focus on deploying the models for real BCI applications and large group-based brain disorder classification.
Funding: This research received no external funding.

Data Availability Statement:
Publicly available data were used in this study. The data can be found here: https://eeglab.org/tutorials/10_Group_analysis/study_creation.html#description-of-the-5subject-experiment-tutorial-data (accessed on 1 October 2022).

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The customized and group-based unichannel classification accuracies of the 12 topranked channels are presented in this appendix. The channels, Z-MatrixNet, and V-VectorNet, are represented by CH, Z-M, and V-V, respectively. Each result in the tables is the average of testing 19,500 m-ERPs. The best result across the 12 channels is in boldface. Table A1. Unichannel accuracies of the customized classifiers for single trials (1-ERPs).   Table A2. Unichannel accuracies of the customized classifiers for 2-ERPs.    Table A4. Leave-one-subject-out unichannel accuracies for 1-ERPs.   Table A5. Leave-one-subject-out unichannel accuracies for 2-ERPs.