MultiResUNet3+: A Full-Scale Connected Multi-Residual UNet Model to Denoise Electrooculogram and Electromyogram Artifacts from Corrupted Electroencephalogram Signals

Electroencephalogram (EEG) signals immensely suffer from several physiological artifacts, including electrooculogram (EOG), electromyogram (EMG), and electrocardiogram (ECG) artifacts, which must be removed to ensure EEG’s usability. This paper proposes a novel one-dimensional convolutional neural network (1D-CNN), i.e., MultiResUNet3+, to denoise physiological artifacts from corrupted EEG. A publicly available dataset containing clean EEG, EOG, and EMG segments is used to generate semi-synthetic noisy EEG to train, validate and test the proposed MultiResUNet3+, along with four other 1D-CNN models (FPN, UNet, MCGUNet, LinkNet). Adopting a five-fold cross-validation technique, all five models’ performance is measured by estimating temporal and spectral percentage reduction in artifacts, temporal and spectral relative root mean squared error, and average power ratio of each of the five EEG bands to whole spectra. The proposed MultiResUNet3+ achieved the highest temporal and spectral percentage reduction of 94.82% and 92.84%, respectively, in EOG artifacts removal from EOG-contaminated EEG. Moreover, compared to the other four 1D-segmentation models, the proposed MultiResUNet3+ eliminated 83.21% of the spectral artifacts from the EMG-corrupted EEG, which is also the highest. In most situations, our proposed model performed better than the other four 1D-CNN models, evident by the computed performance evaluation metrics.


Introduction
The electrophysiological activity of the cerebral cortex of the human brain is represented by electroencephalogram (EEG) signals, which are non-invasively recorded at the scalp [1]. EEG is crucial for various therapeutic applications as well as neurological research. Long-duration human epileptic seizure episodes are routinely detected with EEG [2,3]. EEG is also widely used for other purposes, such as Alzheimer's disease diagnosis [4,5], sleep stages measurement [6], assessment of cognitive workload [7], recognition of human emotion [8], establishing brain-computer interfaces (BCIs) [9], biometric systems [10], etc. EEG recordings, nevertheless, greatly suffer from a range of physiological and non-physiological artifacts, such as ocular/electrooculogram (EOG) artifacts [11], myogenic/electromyogram

•
The proposed MultiResUNet3+ can effectively denoise EOG, EMG, and concurrent EOG and EMG artifacts from corrupted EEG waveforms.

•
We have created a diverse and representative semi-synthetic EEG dataset closely resembling real-world corrupted EEG signals. The proposed 1D-segmentation model was trained and evaluated using 5-fold cross-validation, which ensured the reliability and robustness of the proposed model.

•
We used five well-established performance metrics to comprehensively assess and compare the denoising performance of each of the five 1D-segmentation models. • Our developed model may be used for denoising multi-channel, actual EEG data as the model was trained with diverse artifactual data.
The remainder of this manuscript has been structured as follows: Section 2 explains the intricacies of the novel MultiResUNet3+ segmentation network's architecture. This is trailed by a comprehensive overview of the EEG dataset employed, together with the semi-synthetic EEG data generation and normalization approaches that have been adopted in this study. Section 3 provides the experimental details, explaining the formulae of the performance metrics that are readily available for usage. The quantitative and qualitative performance of the proposed model compared with four different 1D-CNN models is provided in Section 4, and an analysis of the results is presented. Section 5 comprises a cogent discussion of the outcomes, including the limitations inherent in our study, while the paper is concluded concisely in Section 6.

Materials and Methods
Using the most up-to-date 1D-CNN-based segmentation networks, Figure 1 shows the end-to-end framework suggested in this study for efficiently denoising EEG signals semisynthetically corrupted by EOG/EMG/concurrent EOG and EMG artifacts. The figure provides a comprehensive visual representation of our proposed approach and requires no further elucidation.
In the subsequent sub-sections, we detail our proposed MultiResUNet3+ segmentation network, which forms the core of our denoising framework. We also present a concise overview of the EEG dataset employed in this work and discuss the semi-synthetic data generation and normalization techniques adopted in our study, which are crucial in achieving reliable and robust denoising results.

Proposed Novel MultiResUNet3+ Model Description
Our proposed MultiResUNet3+ model for segmentation effectively combines the concepts of MultiResUNet [58] and UNet3+ [59] networks inside a single framework. The building blocks of the 1D-MultiResUNet3+ model used in this study (depth of 5) have been depicted in Figure 2. UNet3+ consists of full-scale skip connections combining interconnections between the encoders and the decoders and intra-connections between the decoder sub-networks. Unlike UNet [55] and UNet++ [60], UNet3+ can incorporate small and larger-scale feature maps in each decoder layer, allowing it to extract fine and coarse-grained semantics in full scales. It was employed in the MultiResUNet3+ for this application to effectively minimize semantically different EOG and EMG artifacts from EEG waveforms. EMG requires a finer approach, while EOG-induced impurities are coarser. Nevertheless, UNet3+ still used direct skip connections for inter-and intra-connections, prevailing the semantic gap issue in the basic UNet architecture, which was solved by more advanced architectures, such as UNet++ and MultiResUNet. To reduce the semantic gap, MultiResUNet or similar architectures proposed replacing the skip connections with various formations of convolutional blocks. Reducing the semantic gaps in this application will help the model efficiently learn and generate EEG features, especially when EOG and EMG noises are mixed, to create a more realistic but challenging scenario. Therefore, contrary to UNet3+, our proposed MultiResUNet3+ contains full-scale Residual Paths ( Figure 3) instead of direct skip

Proposed Novel MultiResUNet3+ Model Description
Our proposed MultiResUNet3+ model for segmentation effectively combines the concepts of MultiResUNet [58] and UNet3+ [59] networks inside a single framework. The building blocks of the 1D-MultiResUNet3+ model used in this study (depth of 5) have been depicted in Figure 2.

Proposed Novel MultiResUNet3+ Model Description
Our proposed MultiResUNet3+ model for segmentation effectively combines the concepts of MultiResUNet [58] and UNet3+ [59] networks inside a single framework. The building blocks of the 1D-MultiResUNet3+ model used in this study (depth of 5) have been depicted in Figure 2. UNet3+ consists of full-scale skip connections combining interconnections between the encoders and the decoders and intra-connections between the decoder sub-networks. Unlike UNet [55] and UNet++ [60], UNet3+ can incorporate small and larger-scale feature maps in each decoder layer, allowing it to extract fine and coarse-grained semantics in full scales. It was employed in the MultiResUNet3+ for this application to effectively minimize semantically different EOG and EMG artifacts from EEG waveforms. EMG requires a finer approach, while EOG-induced impurities are coarser. Nevertheless, UNet3+ still used direct skip connections for inter-and intra-connections, prevailing the semantic gap issue in the basic UNet architecture, which was solved by more advanced architectures, such as UNet++ and MultiResUNet. To reduce the semantic gap, MultiResUNet or similar architectures proposed replacing the skip connections with various formations of convolutional blocks. Reducing the semantic gaps in this application will help the model efficiently learn and generate EEG features, especially when EOG and EMG noises are mixed, to create a more realistic but challenging scenario. Therefore, contrary to UNet3+, our proposed MultiResUNet3+ contains full-scale Residual Paths ( Figure 3) instead of direct skip connections for inter-and intra-connections. Instead of combining the encoder and UNet3+ consists of full-scale skip connections combining interconnections between the encoders and the decoders and intra-connections between the decoder sub-networks. Unlike UNet [55] and UNet++ [60], UNet3+ can incorporate small and larger-scale feature maps in each decoder layer, allowing it to extract fine and coarse-grained semantics in full scales. It was employed in the MultiResUNet3+ for this application to effectively minimize semantically different EOG and EMG artifacts from EEG waveforms. EMG requires a finer approach, while EOG-induced impurities are coarser. Nevertheless, UNet3+ still used direct skip connections for inter-and intra-connections, prevailing the semantic gap issue in the basic UNet architecture, which was solved by more advanced architectures, such as UNet++ and MultiResUNet. To reduce the semantic gap, MultiResUNet or similar architectures proposed replacing the skip connections with various formations of convolutional blocks. Reducing the semantic gaps in this application will help the model efficiently learn and generate EEG features, especially when EOG and EMG noises are mixed, to create a more realistic but challenging scenario. Therefore, contrary to UNet3+, our proposed MultiResUNet3+ contains full-scale Residual Paths ( Figure 3) instead of direct skip connections for inter-and intra-connections. Instead of combining the encoder and decoder features straightforwardly (e.g., concatenation in UNet and addition in LinkNet [57]), the encoder features are passed through several convolutional layers with residual connections (Figure 4). Mentionable that residual connections [35] have been beneficial in several deep learning applications during their learning process. The depth of the network affects how many residual-convolutional blocks are used in the inter-and intra-ResPaths for MultiRe-sUNet3+. For example, for the model drawn in Figure 2 with a depth of 5, the number of residual-convolutional blocks along the inter-ResPaths will be 4, 3, 2, and 1, respectively, generated from shallower to deeper layers. On the other hand, for the intra-ResPaths densely connecting the decoders, the number of residual-convolutional blocks will be 1, 2, 3, and 4, respectively, from deeper to shallower layers. More residual-convolutional blocks are placed along the ResPaths generated from a shallower layer for improved processing of the coarser features.
Bioengineering 2023, 10, x FOR PEER REVIEW 5 of 27 decoder features straightforwardly (e.g., concatenation in UNet and addition in LinkNet [57]), the encoder features are passed through several convolutional layers with residual connections (Figure 4). Mentionable that residual connections [35] have been beneficial in several deep learning applications during their learning process. The depth of the network affects how many residual-convolutional blocks are used in the inter-and intra-ResPaths for MultiResUNet3+. For example, for the model drawn in Figure 2 with a depth of 5, the number of residual-convolutional blocks along the inter-ResPaths will be 4, 3, 2, and 1, respectively, generated from shallower to deeper layers. On the other hand, for the intra-ResPaths densely connecting the decoders, the number of residual-convolutional blocks will be 1, 2, 3, and 4, respectively, from deeper to shallower layers. More residual-convolutional blocks are placed along the ResPaths generated from a shallower layer for improved processing of the coarser features. Here, Figure 3 represents the full-scale aggregated feature map creation process for MultiResUNet3+ for the decoder . Similar to MultiResUNet, the feature map from the same scale encoder layer is received by the decoder following a ResPath. However, contrary to MultiResUNet, a set of encoder-decoder Residual Paths delivers the semantically enhanced low-level detailed information from the smaller-scale encoder layers and through non-overlapping max-pooling operations.  [58] used for inter-and intra-connections in Mul-tiResUNet3+ is expected to reduce the semantic gaps by replacing direct skip connections. This particular ResPath represents the one produced from and among the inter-and intra-ResPaths, respectively, in Figure 2. De aided by Residual Paths (ResPaths in short), following the process proposed in [59]. decoder features straightforwardly (e.g., concatenation in UNet and addition in LinkNet [57]), the encoder features are passed through several convolutional layers with residual connections (Figure 4). Mentionable that residual connections [35] have been beneficial in several deep learning applications during their learning process. The depth of the network affects how many residual-convolutional blocks are used in the inter-and intra-ResPaths for MultiResUNet3+. For example, for the model drawn in Figure 2 with a depth of 5, the number of residual-convolutional blocks along the inter-ResPaths will be 4, 3, 2, and 1, respectively, generated from shallower to deeper layers. On the other hand, for the intra-ResPaths densely connecting the decoders, the number of residual-convolutional blocks will be 1, 2, 3, and 4, respectively, from deeper to shallower layers. More residual-convolutional blocks are placed along the ResPaths generated from a shallower layer for improved processing of the coarser features. Here, Figure 3 represents the full-scale aggregated feature map creation process for MultiResUNet3+ for the decoder . Similar to MultiResUNet, the feature map from the same scale encoder layer is received by the decoder following a ResPath. However, contrary to MultiResUNet, a set of encoder-decoder Residual Paths delivers the semantically enhanced low-level detailed information from the smaller-scale encoder layers and through non-overlapping max-pooling operations.   [58] used for inter-and intra-connections in MultiResUNet3+ is expected to reduce the semantic gaps by replacing direct skip connections. This particular ResPath represents the one produced from X 1 En and X 2 De among the inter-and intra-ResPaths, respectively, in Figure 2.
Here, Figure 3 represents the full-scale aggregated feature map creation process for MultiResUNet3+ for the decoder X 3 De . Similar to MultiResUNet, the feature map from the same scale encoder layer X 3 En is received by the decoder following a ResPath. However, contrary to MultiResUNet, a set of encoder-decoder Residual Paths delivers the semantically enhanced low-level detailed information from the smaller-scale encoder layers X 1 En and X 2 En through non-overlapping max-pooling operations.
On the other hand, higher-level semantic information is conveyed from the larger-scale decoder layers, such as X 4 De and X 5 De , through intra-ResPath connections by utilizing nearest interpolation. Now, as in Figures 2 and 3, there are five total inter-and intra-Residual Paths ( Figure 4) containing same-resolution feature maps waiting to be unified before reaching X 3 De . To properly merge the shallow and deep semantic information, we perform a feature aggregation mechanism on the ResPaths by placing a Multi-Residual or MultiRes Block ( Figure 5) containing W = n × d input kernels or filters of size 3 × 3 after concatenating the five feature maps. Here, n denotes the input filter number, and the depth of the model is represented by d. For example, if n is 64 and d is 5, W will be 320. On the other hand, higher-level semantic information is conveyed from the largerscale decoder layers, such as and , through intra-ResPath connections by utilizing nearest interpolation. Now, as in Figures 2 and 3, there are five total inter-and intra-Residual Paths (Figure 4) containing same-resolution feature maps waiting to be unified before reaching . To properly merge the shallow and deep semantic information, we perform a feature aggregation mechanism on the ResPaths by placing a Multi-Residual or MultiRes Block ( Figure 5) containing = × input kernels or filters of size 3 × 3 after concatenating the five feature maps. Here, n denotes the input filter number, and the depth of the model is represented by d. For example, if n is 64 and d is 5, W will be 320. The MultiRes Block used in MultiResUNet3+ is similar to the "Module A" of the In-ceptionV4 network [61]. It consists of a succession of 3 × 3 sized filters being concatenated and then element-wise added by a residual connection passed through a 1 × 1 convolutional block to squeeze dimensions. This helps us to aggregate spatial features from various context sizes while avoiding expensive filters with larger kernels (e.g., 5 × 5 or 7 × 7, primarily used in earlier Inception networks). For the MultiRes block, instead of keeping the number of filters the same, it is more efficient and less expensive to gradually increase them while keeping the total number of filters the same as if it were a single convolutional block, as explained in [58]. For instance, if the number of input filters for the MultiRes block used for feature aggregation is = × = 320, the width of the three convolution blocks inside the MultiRes block will be 6 ⁄ = (320 6 ⁄ ) ≈ 53 , = (320 3 ⁄ ) ≈ 107, and 2 ⁄ = 160, respectively. Each MultiRes block is followed by batch normalization and ReLU activation layers.

Dataset Description
A publicly available dataset, namely EEGdenoiseNet [50], was found to be suitable for this work as it contained pre-processed, well-structured physiological segments gathered from various sources, including 4514 clean EEG segments, 3400 pure EOG segments, and 5598 clean EMG segments. We have used these readily available segments to generate semi-synthetic noisy EEG, which were used to train and test the five distinct 1D-segmentation models.
Several signal-processing techniques were adopted by the EEGdenoiseNet [50] authors. They processed the clean EEG signals (collected from several sources) by applying a bandpass filter with the lower and upper cutoff frequencies of 1 and 80 Hz, removed the power line noise using a notch filter, and then down-sampled at 256 Hz. The clean EEG signals were then segmented, and 4514 clean EEG segments were generated, each containing 512 data points. The clean EOG signals (horizontal and vertical) were bandpass filtered with a passband of 0.3-10 Hz, resampled to 256 data points per second. These processed clean EOG signals were segmented, and 3400 segments were produced, where each clean EOG segment has 512 data points, the same as the EEG segments. The EMG signal was filtered using a bandpass filter with a lower cutoff frequency of 1 Hz and upper cutoff frequency of 120 Hz, and the powerline frequency component was removed using a notch filter. Then the raw EMG signals were resampled to 512 data points per second, and 5598 clean muscle/EMG segments were generated, each containing 1024 data points [50]. As EMG signals have essential features in the high-frequency band, the clean EMG signals The MultiRes Block used in MultiResUNet3+ is similar to the "Module A" of the InceptionV4 network [61]. It consists of a succession of 3 × 3 sized filters being concatenated and then element-wise added by a residual connection passed through a 1 × 1 convolutional block to squeeze dimensions. This helps us to aggregate spatial features from various context sizes while avoiding expensive filters with larger kernels (e.g., 5 × 5 or 7 × 7, primarily used in earlier Inception networks). For the MultiRes block, instead of keeping the number of filters the same, it is more efficient and less expensive to gradually increase them while keeping the total number of filters the same as if it were a single convolutional block, as explained in [58]. For instance, if the number of input filters for the MultiRes block used for feature aggregation is W = n × d = 320, the width of the three convolution blocks inside the MultiRes block will be w/6 = round(320/6) ≈ 53, W 3 = round(320/3) ≈ 107, and w/2 = 160, respectively. Each MultiRes block is followed by batch normalization and ReLU activation layers.

Dataset Description
A publicly available dataset, namely EEGdenoiseNet [50], was found to be suitable for this work as it contained pre-processed, well-structured physiological segments gathered from various sources, including 4514 clean EEG segments, 3400 pure EOG segments, and 5598 clean EMG segments. We have used these readily available segments to generate semi-synthetic noisy EEG, which were used to train and test the five distinct 1D-segmentation models.
Several signal-processing techniques were adopted by the EEGdenoiseNet [50] authors. They processed the clean EEG signals (collected from several sources) by applying a bandpass filter with the lower and upper cutoff frequencies of 1 and 80 Hz, removed the power line noise using a notch filter, and then down-sampled at 256 Hz. The clean EEG signals were then segmented, and 4514 clean EEG segments were generated, each containing 512 data points. The clean EOG signals (horizontal and vertical) were bandpass filtered with a passband of 0.3-10 Hz, resampled to 256 data points per second. These processed clean EOG signals were segmented, and 3400 segments were produced, where each clean EOG segment has 512 data points, the same as the EEG segments. The EMG signal was filtered using a bandpass filter with a lower cutoff frequency of 1 Hz and upper cutoff frequency of 120 Hz, and the powerline frequency component was removed using a notch filter. Then the raw EMG signals were resampled to 512 data points per second, and 5598 clean muscle/EMG segments were generated, each containing 1024 data points [50]. As EMG signals have essential features in the high-frequency band, the clean EMG signals were sampled at 512 Hz to retain those features and morphology. Figure 6 depicts one clean EEG, one clean vertical EOG, one pure horizontal EOG, and one clean EMG segment sample. Since the authors of [50] adopted all the necessary pre-processing techniques, no further pre-processing steps were undertaken in this study. Instead, the publicly available segments are directly used to produce a large and diverse semi-synthetic corrupted EEG dataset. were sampled at 512 Hz to retain those features and morphology. Figure 6 depicts one clean EEG, one clean vertical EOG, one pure horizontal EOG, and one clean EMG segment sample. Since the authors of [50] adopted all the necessary pre-processing techniques, no further pre-processing steps were undertaken in this study. Instead, the publicly available segments are directly used to produce a large and diverse semi-synthetic corrupted EEG dataset. Noticeably, EMG has a much higher magnitude compared to clean EEG and EOG segments.

Semi-Synthetic Electroencephalogram Segment Generation and Normalization
For this study, we generate semi-synthetic EOG, EMG, and EOG-EMG contaminated EEG segments from the true signals in the EEGdenoiseNet dataset. By linearly mixing one clean EEG segment with one clean EOG and/or EMG segment using Equation (1), a semisynthetic corrupted EEG segment can be produced [50]: Here, denotes the ground truth or clean EEG segments, is the generated semisynthetic noisy EEG, and characterizes the EOG and/or EMG artifacts. By solving Equation (2) and altering the scaling factor , the signal-to-noise Ratio (SNR) of the semisynthetically generated contaminated EEG segment can easily be adjusted to different levels as follows [50]: For the computation of root mean square (RMS) value, Equation (3) is utilized [50]: Noticeably, EMG has a much higher magnitude compared to clean EEG and EOG segments.

Semi-Synthetic Electroencephalogram Segment Generation and Normalization
For this study, we generate semi-synthetic EOG, EMG, and EOG-EMG contaminated EEG segments from the true signals in the EEGdenoiseNet dataset. By linearly mixing one clean EEG segment with one clean EOG and/or EMG segment using Equation (1), a semi-synthetic corrupted EEG segment can be produced [50]: Here, x denotes the ground truth or clean EEG segments, y is the generated semisynthetic noisy EEG, and n characterizes the EOG and/or EMG artifacts. By solving Equation (2) and altering the scaling factor λ, the signal-to-noise Ratio (SNR) of the semisynthetically generated contaminated EEG segment can easily be adjusted to different levels as follows [50]: For the computation of root mean square (RMS) value, Equation (3) is utilized [50]: Here, m stands for the total number of data points of segment w, and w i stands for the i th sample point of w. The scaling factor λ is pivotal in determining the SNR of the semi-synthetic, corrupted EEG segments. As a rule of thumb, a lower value of λ corresponds to a higher SNR, whereas a higher λ leads to a poorer SNR. The SNR of EEG solely corrupted by EOG artifacts typically falls within the range of −7 to 2 dB, as reported in previous studies [23]. On the other hand, EEG signals corrupted by EMG artifacts exhibit a comparatively wider range of SNR, typically ranging from −7 to 4 dB [62]. The availability of ground truth and noisy EEG segments allowed us to train, test and validate the deep learning models for EEG denoising. However, rather than directly feeding the (x, y) pairs into the 1D-CNN models, we considered the standard deviation of the corrupted EEG segments (σ y ) and divided the clean and contaminated EEG signals by this value (Equation (4)). The result was a pair of rescaled or normalized segments (x,ŷ), fed to the 1D-segmentation models for training, testing, and validation.
Since deep learning models are sensitive to variations in the magnitude of input data, without scaling or normalization, the model may be unable to make useful inferences. Normalization aids in keeping all input features of equal importance, which is especially helpful when dealing with large-scale data variations. Normalization also helps the model to converge faster during model training by minimizing the variance of the input data. Moreover, the training process can be accelerated via normalization, and the model's learning capacity can be facilitated. Again, normalization aids in avoiding overfitting.
In this study, the EOG-contaminated EEG segments were generated by linearly mixing randomly chosen 3400 clean EEG segments (out of 4514 clean EEG segments) with the available 3400 EOG segments. This process was repeated for ten different integer SNR levels (−7 to 2 dB), producing 34,000 semi-synthetic EOG-contaminated EEG segments. The sampling frequency of both EOG and EEG segments was kept at 256 Hz. It is worthwhile to mention that as there were 4514 clean EEG segments available, we could easily produce 45,140 semi-synthetic corrupted EEG segments for ten different SNR levels by using some clean EOG more than once during linear mixing. However, this process was avoided to prevent data-leaking issues.
Similarly, EMG-contaminated EEG segments were created by combining the EEG segments with the randomly selected clean EMG segments for 10 different SNR levels. Before linearly mixing EEG and EMG segments, all the clean EEG segments were upsampled by a factor of 2 to match the number of data points each EMG segment contains. The upsampling of EEG (from 256 to 512 Hz) did not cause any morphological change in EEG segments as the bandwidth of EEG is 1-80 Hz [63]. Eventually, we generated 45,140 EMG-contaminated EEG segments for the ten different SNR levels semi-synthetically. The simultaneous EOG and EMG contaminated EEG segments were generated by mixing the 3400 clean EEG segments (randomly taken out from 4514 clean EEG segments) with 3400 EOG and 3400 EMG (randomly taken out of 5598 EMG segments) segments for ten different SNR levels. The clean EEG and EOG segments were up-sampled at 512 Hz before linear mixing to match the sampling frequency of the EMG segments. This process generated 34,000 corrupted EEG segments for all ten different SNR levels. Figure 7a displays a picked at-random EOG-contaminated EEG segment. Figure 7b illustrates one arbitrary EMG-contaminated EEG segment, whereas Figure 7c represents one random sample of simultaneous EOG and EMG-contaminated EEG segments. The corresponding ground truth EEG segments are also superimposed in Figure 7a-c.

Experimental Setup
In pursuit of denoising EEG signals from physiological (EOG/EMG/concurrent EOG and EMG) artifacts, the novel MultiResUNet3+ model, alongside four other 1D-CNN models, was trained. The training involved feeding the neural networks normalized contaminated EEG segments as input (generated using Equation (4)) while providing corresponding normalized ground truth EEG segments as output. This process made it easier for the DL-based model to create a nonlinear function that translated the noisy EEG to its equivalent ground truth. The mean squared error (MSE) was employed as the loss function to generate the nonlinear mapping function. The loss function was optimized using the Adam optimizer with a learning rate of 0.0005. Eighty percent of the data was used as the training set and the remaining 20% for the test set. Ten percent of the training set data were used as the validation set. It is worth mentioning here that each EEG, EOG, and EMG segment is only used once to produce the semi-synthetic dataset. Therefore, any sort of data leaking between train, test, and validation sets is absent. All five networks were trained, validated, and tested independently using the five-fold cross-validation technique in the Google ColabPro environment using Python 3.10 framework to ensure robustness and reliability in the evaluation process. In this work, the experimentation was done with a twofold approach which is described below:

Experiment A
As mentioned earlier, there were in total 34,000 EOG-contaminated EEG segments having ten integer SNR values ranging from −7 to +2 dB, where for each SNR value, 3400

Experimental Setup
In pursuit of denoising EEG signals from physiological (EOG/EMG/concurrent EOG and EMG) artifacts, the novel MultiResUNet3+ model, alongside four other 1D-CNN models, was trained. The training involved feeding the neural networks normalized contaminated EEG segments as input (generated using Equation (4)) while providing corresponding normalized ground truth EEG segments as output. This process made it easier for the DL-based model to create a nonlinear function that translated the noisy EEG to its equivalent ground truth. The mean squared error (MSE) was employed as the loss function to generate the nonlinear mapping function. The loss function was optimized using the Adam optimizer with a learning rate of 0.0005. Eighty percent of the data was used as the training set and the remaining 20% for the test set. Ten percent of the training set data were used as the validation set. It is worth mentioning here that each EEG, EOG, and EMG segment is only used once to produce the semi-synthetic dataset. Therefore, any sort of data leaking between train, test, and validation sets is absent. All five networks were trained, validated, and tested independently using the five-fold cross-validation technique in the Google ColabPro environment using Python 3.10 framework to ensure robustness and reliability in the evaluation process. In this work, the experimentation was done with a twofold approach which is described below:

Experiment A
As mentioned earlier, there were in total 34,000 EOG-contaminated EEG segments having ten integer SNR values ranging from −7 to +2 dB, where for each SNR value, 3400 EEG segments contaminated with ocular artifacts were generated. For EMG-contaminated EEG, the total number of contaminated segments was 45,140. In contrast, for simultaneous EOG and EMG corrupted EEG, this number was 34,000 for ten different SNR levels ranging from −7 to +2 dB. For all these three types of artifacts, we trained each of the five 1D-CNN models ten times, utilizing the corrupted and corresponding ground truth segments for ten different SNR values. That is, for EOG corrupted EEG segments at SNR level −7 dB, 80% of 3400 segments (2720) were used for training the models, and the remaining 20% (680 segments) were utilized for testing. This same process was carried out separately for nine other integer SNR levels (−6 to 2 dB).
Similarly, for ten different SNR levels, EMG-contaminated EEG segments, and simultaneous EOG and EMG-contaminated EEG segments, were utilized for training and testing all five networks ten times separately. Employing a deep supervision technique as described in [64], in this experiment, we computed three established performance metrics, namely the correlation coefficient (CC) in the time domain, and temporal and spectral relative root mean squared error (RRMSE). These metrics were particularly used to evaluate the effectiveness of the five DL-based models in denoising contaminated EEG. The five-fold cross-validation technique was employed, and the metrics mentioned earlier were computed for each of the five folds to ensure robustness and reliability in the evaluation process.

Experiment B
In Experiment B, a comprehensive approach was taken to generate train and test sets so that a more robust DL-based model could be trained that would perform well across a wide range of SNR levels, which is crucial for real-world applications where the exact SNR of the input signal may vary due to factors, such as electrode placement, the patient's movements, or equipment quality. For each of the 10 different SNR levels, 80% of the EOG-contaminated EEG segments and their corresponding ground truth EEG segments were extracted to produce a more extensive dataset (2720 pairs in each SNR level). After merging these pairs, a training dataset of 27,200 pairs was created. The remaining 20% of pairs from each of the ten distinct SNR levels were used to generate the test set (6800 pairs). Unlike in Experiment A, where models were trained and evaluated for specific SNR level segments, the models in Experiment B were trained using combined segments of all ten distinct SNR levels. This improved model generalization, making them more resilient when evaluated with noisy EEG segments with varying SNR levels. In essence, Experiment B took a holistic approach to train DL models that could perform well across the noisy EEG segments having a range of SNR levels, thereby increasing their utility in real-world scenarios where the SNR of the input signal may not be precisely known. Following a similar protocol, a total of 36,110 EMG-contaminated EEG segments and 27,200 pairs of simultaneous EOG and EMG-contaminated EEG segments were generated separately to form the train set, and the remaining 20% contaminated EEG segments were used as test set (9030 pairs of EMG-contaminated EEG segments, and 6800 pairs of simultaneous EOG and EMG contaminated EEG segments) for the segmentation models. The efficacy of the trained models was quantitatively measured using five performance metrics, i.e., the percentage reduction in artifacts in the time and frequency domain, temporal and spectral RRMSE, and the average power ratio of each of the five different EEG bands (Alpha, Beta, Gamma, Delta, Theta bands) to the whole band separately.

Performance Evaluation Metrics
The proper quantitative assessment of any DL model is paramount in determining its ability to accomplish the intended task, in our case, effectively denoising EEG signals. In this study, meticulous attention was devoted to selecting the most appropriate performance parameters for an adept evaluation of the chosen segmentation models. The efficacy of the five 1D-CNN models was quantitatively evaluated by calculating several temporal and spectral performance metrics, such as the correlation coefficient (CC), the percentage reduction in artifacts, and the relative root mean squared error (RRMSE). These metrics were critical in determining the best-performing models. The relevant equations for these measures can be found in Equations (5)-(9), as collected from [50,65], Here, the time domain correlation coefficient is represented by CC temporal while covariance is denoted by Cov. The predicted EEG segments are represented byẑ, whereaŝ x denotes the normalized ground truth EEG segments. Variance is characterized by Var, while η represents the temporal percentage reduction of EOG/EMG/simultaneous EOG and EMG artifacts from the corrupted EEG, and γ represents the percentage reduction of EOG, EMG, or simultaneous EOG and EMG artifacts from the corrupted EEG in the frequency domain. Furthermore, the time domain correlation coefficient between the predicted and the ground truth EEG segments is denoted by CC temporal(a f ) , while CC temporal(b f ) is used to represent the time domain correlation coefficient between contaminated and ground truth EEG segments. Similarly, the frequency domain correlation coefficient between the predicted and the ground truth EEG segments is expressed by CC spectral(a f ) , whereas CC spectral(b f ) is used to represent the frequency domain correlation coefficient between the contaminated and the ground truth EEG segments. RMS is the abbreviation for root mean square, which can be computed using Equation (3), and finally, the PSD is the power spectral density computed using the Periodogram method. The Periodogram method involves calculating the Discrete FT (DFT) of the signal and then taking the square of the absolute value of the DFT to obtain the power spectral density.
We have selected these performance metrics to evaluate the 1D-CNN-based segmentation models. The reasoning for choosing these metrics is summarized as follows: The correlation coefficient in the time domain (CC temporal ) measures the degree of similarity between two variables. A higher correlation coefficient between predicted and ground truth EEG would indicate that the predicted EEG is more similar to the ground truth EEG and vice-versa. Hence, calculating the correlation coefficient is a valuable tool that quantitatively measures the model's adeptness/inability in denoising.
The metric temporal percentage reduction in artifacts (η) measures the proportion of EOG/EMG/simultaneous EOG and EMG artifacts removed from the noisy EEG signal in the time domain. A higher temporal percentage reduction in artifacts indicates that more of the artifact has been removed from the EEG signal, resulting in a clean and more accurate EEG signal.
In contrast, the spectral percentage reduction in EOG/EMG/simultaneous EOG and EMG artifacts (γ) measures the proportion of EOG/EMG artifacts removed from the EEG signal in the frequency domain. A higher spectral percentage reduction indicates that more of the artifact has been removed from the EEG signal across all frequency bands and vice-versa.
The RRMSE measured in the time domain (RRMSE temporal ) can provide insights into the temporal dynamics of the denoised EEG signals. A low temporal RRMSE value indicates that the predicted individual EEG segment closely approximates the corresponding ground truth EEG segment.
The RRMSE measured in the frequency domain (RRMSE spectral ) can help assess the ability of the predictor model to capture important spectral features of the EEG signals in the whole synthetic dataset, such as alpha, beta, gamma, delta, and theta bands. A low spectral RRMSE value indicates that the predicted EEG signal accurately captures the power distribution across different frequency bands. This is important for ensuring that the predicted EEG signal is not biased towards or against any particular frequency band, which could have unintended effects on subsequent analyses.
The average power ratio measures the power of a given frequency band against the power of the EEG signal over the whole spectrum. It is computed by dividing the power of a particular frequency band by the total power of the signal. In general, by calculating the average power ratio for each frequency band separately, we can better understand the distribution of power across the different frequency ranges, which is vital. Specifically, while estimating average power ratios for ground truth and predicted denoised signal for any frequency band, a closely matched numerical value would indicate that the signal is predicted more accurately. In contrast, the power ratio of noisy and ground truth signals should have far apart numerical values. For this reason, the performance of the 1D-CNN models for five different frequency bands of EEG is also reported in this study. Specifically, the five EEG frequency bands are delta (1-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and gamma (30-80 Hz). The average power ratio of each of these bands to the entire band (1-80 Hz) is calculated separately for each predicted, corresponding ground truth, and contaminated EEG segment. The average power ratio was computed following the Periodogram method [65].

Results
This section details the outcomes of Experiments A and B and analyses the results concisely.

Experiment A Outcomes
For the quantitative performance evaluation of Experiment A, three performance metrics are computed, namely (i) time domain correlation coefficient, (ii) temporal RRMSE, and (iii) spectral RRMSE utilizing the ground truth EEG and the denoised EEG predicted by all five models in three different denoising scenarios, i.e., for EOG-artifacts removal, EMG-artifacts removal, and concurrent EOG and EMG artifacts removal from corrupted EEG, separately.
In Tables 1-3, the numerical values of the three performance metrics (CC temporal , RRMSE temporal , and RRMSE spectral ) are presented for the denoised EOG, EMG, and simultaneous EOG and EMG-contaminated EEG segments, respectively. On the other hand, Figure 8 illustrates the evaluation of five different 1D-CNN models in the context of EOGcontaminated EEG denoising. Specifically, the figure presents CC temporal values obtained across ten integer SNR levels, spanning from −7 to +2 dB. Additionally, Figure 8 also displays the RRMSE temporal and RRMSE spectral values plotted against the same ten SNR levels. Similarly, Figures 9 and 10 illustrate the same performance metrics plotted against ten different SNR levels computed for the predicted denoised EMG-contaminated EEG segments and simultaneous EOG and EMG-contaminated EEG segments.  It is apparent from Table 1 and Figure 8 that our proposed MultiResUNet3+ model outperformed all remaining four 1D-segmentation models in terms of removing EOG artifacts from noisy EEG segments. This superiority is consistently observed across all ten different SNR levels. Specifically, the MultiResUNet3+ model achieved the highest CC values, indicating a higher correlation between the predicted and ground truth EEG segments. Furthermore, the RRMSE values obtained from the MultiResUNet3+ model were the lowest in both the temporal and spectral domains, again demonstrating its superior accuracy and precision in removing EOG artifacts. Overall, these findings highlight the effectiveness and robustness of the proposed MultiResUNet3+ model in reducing EOG artifacts from EOG-contaminated EEG denoising, even under challenging conditions characterized by low SNR levels.
As can be observed from Figure 9, the five 1D-CNN models displayed similar performance in removing EMG artifacts from the EMG-contaminated EEG segments, as indicated by the almost overlapping curves of all three-performance metrics. Given the difficulty of determining the best-performing model solely based on Figure 9, Table 2 is included to provide numerical values of , , and . The tabulated data highlight the inconsistent performance of the 1D-segmentation networks in predicting EMG-artifacts-free EEG. Notably, the proposed MultiResUNet3+ and MCGUNet demonstrated relatively superior performance compared to the other three models. It is apparent from Table 1 and Figure 8 that our proposed MultiResUNet3+ model outperformed all remaining four 1D-segmentation models in terms of removing EOG artifacts from noisy EEG segments. This superiority is consistently observed across all ten different SNR levels. Specifically, the MultiResUNet3+ model achieved the highest CC values, indicating a higher correlation between the predicted and ground truth EEG segments. Furthermore, the RRMSE values obtained from the MultiResUNet3+ model were the lowest in both the temporal and spectral domains, again demonstrating its superior accuracy and precision in removing EOG artifacts. Overall, these findings highlight the effectiveness and robustness of the proposed MultiResUNet3+ model in reducing EOG artifacts from EOG-contaminated EEG denoising, even under challenging conditions characterized by low SNR levels.
As can be observed from Figure 9, the five 1D-CNN models displayed similar performance in removing EMG artifacts from the EMG-contaminated EEG segments, as indicated by the almost overlapping curves of all three-performance metrics. Given the difficulty of determining the best-performing model solely based on Figure 9, Table 2 is included to provide numerical values of CC temporal , RRMSE temporal , and RRMSE spectral . The tabulated data highlight the inconsistent performance of the 1D-segmentation networks in predicting EMG-artifacts-free EEG. Notably, the proposed MultiResUNet3+ and MCGUNet demonstrated relatively superior performance compared to the other three models.  During the denoising of simultaneous EOG and EMG artifacts from corrupted EEG segments, the greatest temporal correlation coefficient (a little above 0.95) is obtained from MultiResUNet3+ for the segments having the lowest noise level (SNR level of +2 dB). The smallest RRMSE in the time and frequency domain is also achieved by our proposed Mul-tiResUNet3+ model compared to the other four 1D-CNN models while estimating the performance on test segments (refer to Table 3 and Figure 10).
Overall, for EOG/EMG/simultaneous EOG and EMG artifacts removal, the efficacy of the deep learning (DL) models is observed to be enhanced in parallel with the increment of the SNR, as anticipated. The increase of SNR level results in a proportional reduction in the noise quotient (comprising EOG, EMG, and simultaneous EOG and EMG), thereby possibly reducing the complexity of the nonlinear mapping function acquired by the DLbased segmentation models for forecasting denoised EEG segments. Ultimately, this leads to an improvement in the model's performance in predicting noise-free EEG. During the denoising of simultaneous EOG and EMG artifacts from corrupted EEG segments, the greatest temporal correlation coefficient (a little above 0.95) is obtained from MultiResUNet3+ for the segments having the lowest noise level (SNR level of +2 dB). The smallest RRMSE in the time and frequency domain is also achieved by our proposed MultiResUNet3+ model compared to the other four 1D-CNN models while estimating the performance on test segments (refer to Table 3 and Figure 10).
Overall, for EOG/EMG/simultaneous EOG and EMG artifacts removal, the efficacy of the deep learning (DL) models is observed to be enhanced in parallel with the increment of the SNR, as anticipated. The increase of SNR level results in a proportional reduction in the noise quotient (comprising EOG, EMG, and simultaneous EOG and EMG), thereby possibly reducing the complexity of the nonlinear mapping function acquired by the DL-based segmentation models for forecasting denoised EEG segments. Ultimately, this leads to an improvement in the model's performance in predicting noise-free EEG.

Experiment B Outcomes
All five 1D-CNN models' objective is to estimate artifacts-free EEG segments. To determine the most effective 1D-segmentation model in reducing physiological artifacts from corrupted EEG, five different performance metrics have been computed, namely (i) Temporal percentage reduction in artifacts ( ), (ii) spectral percentage reduction in artifacts ( ), (iii) RRMSE in the time domain ( ), (iv) RRMSE in the frequency domain ( ), and (v) the average power ratio for five distinct EEG bands (Delta, Theta, Alpha, Beta, Gamma) for ground truth, noisy, and predicted artifacts-free EEG segments. Moreover, a sample corrupted, ground truth, and predicted clean EEG segment plots are provided for visual or qualitative assessment. Figure 11a depicts an EEG segment that has been contaminated with EOG artifacts, while Figure 11b-f illustrates EOG artifact-free EEG segments (estimated clean EEG) for all five models. Table 4 provides the average numerical values of the four performance metrics ( , , , ) by the five distinct models. In contrast, Table 5 illustrates the computed average power ratios for five separate EEG bands for ground truth, noisy, and predicted EEG segments.

Experiment B Outcomes
All five 1D-CNN models' objective is to estimate artifacts-free EEG segments. To determine the most effective 1D-segmentation model in reducing physiological artifacts from corrupted EEG, five different performance metrics have been computed, namely (i) Temporal percentage reduction in artifacts (η), (ii) spectral percentage reduction in artifacts (γ), (iii) RRMSE in the time domain (RRMSE temporal ), (iv) RRMSE in the frequency domain (RRMSE spectral ), and (v) the average power ratio for five distinct EEG bands (Delta, Theta, Alpha, Beta, Gamma) for ground truth, noisy, and predicted artifacts-free EEG segments. Moreover, a sample corrupted, ground truth, and predicted clean EEG segment plots are provided for visual or qualitative assessment. Figure 11a depicts an EEG segment that has been contaminated with EOG artifacts, while Figure 11b-f illustrates EOG artifact-free EEG segments (estimated clean EEG) for all five models. Table 4 provides the average numerical values of the four performance metrics (η, γ, RRMSE temporal , and RRMSE spectral ) by the five distinct models. In contrast, Table 5 illustrates the computed average power ratios for five separate EEG bands for ground truth, noisy, and predicted EEG segments.    From Table 4, our proposed MultiResUNet3+ performed best in reducing EOG artifacts from EOG-contaminated EEG segments in the temporal and spectral domains (94.82% and 92.84%, respectively). Although the UNet model produced the lowest RRMSE temporal (~0.11) and RRMSE spectral (~0.12), respectively, MultiResUNet3+ was also very close. UNet produced the closest delta and alpha power ratio compared to the ground truth EEG. For theta, beta, and gamma band power ratio, our proposed MultiResUNet3+ performed best (refer to Table 5).
An example EMG-contaminated EEG segment is illustrated in Figure 12a, and to provide a qualitative overview of the five denoising 1D-CNN models, Figure 12b-f shows EMG artifacts-free EEG (predicted EEG) segments. Table 6 summarizes the four performance metrics (η, γ, RRMSE temporal , and RRMSE spectral ) obtained for the predicted EMG artifacts-free EEG segments by all the five 1D-CNN models separately, and Table 7 contains the average power ratio calculated for five different EEG bands before and after the removal of myogenic artifacts.
Bioengineering 2023, 10, x FOR PEER REVIEW 20 of 27 also very close. UNet produced the closest delta and alpha power ratio compared to the ground truth EEG. For theta, beta, and gamma band power ratio, our proposed Multi-ResUNet3+ performed best (refer to Table 5). An example EMG-contaminated EEG segment is illustrated in Figure 12a, and to provide a qualitative overview of the five denoising 1D-CNN models, Figure 12b-f shows EMG artifacts-free EEG (predicted EEG) segments. Table 6 summarizes the four performance metrics ( , , , ) obtained for the predicted EMG artifacts-free EEG segments by all the five 1D-CNN models separately, and Table 7 contains the average power ratio calculated for five different EEG bands before and after the removal of myogenic artifacts.    For EMG-corrupted EEG segments denoising, the proposed MultiResUNet3+ eliminated 83.21% of spectral EMG artifacts, whereas the UNet eliminated 89.59% of time domain EMG artifacts from the corrupted EEG segments. In terms of η, UNet performed best, and in terms of γ reduction, our proposed model performed best among all the models (refer to Table 6). Again, UNet and MultiResUNet3+ produced the lowest RRMSE temporal (~0.22) and RRMSE spectral (~0.19) values, respectively, when evaluated with the other four 1D CNN networks. Moreover, when comparing the power ratios computed using the MultiResUNet3+ model's predicted segments with the ground truth EEG's power ratios among the delta, beta, and gamma bands, the MultiResUNet3+ model comes out on top. For the theta and alpha band power ratio, LinkNet performed best (refer to Table 7).
In Figure 13a, an example segment of simultaneous EOG and EMG contaminated EEG segment is presented, and in Figure 13b-f, the denoised EEG segment (simultaneous EOG and EMG artifacts-free segment) along with the ground truth EEG is shown separately for the five 1D-CNN models. The four performance metrics for the five 1D-CNN models' predictions on the denoised simultaneous EOG and EMG-contaminated EEG segments can be found in Table 8, whereas Table 9 provides the average power ratio between five different EEG bands before and after removing simultaneous EOG and EMG artifacts.
As observed from Table 8, the proposed MultiResUNet3+, UNet, and LinkNet models produced very close denoising performance for simultaneous EOG and EMG corrupted EEG segments. Artifacts are reduced by 89.77% in the time domain and 83.39% in the frequency domain when LinkNet and UNet models are used, respectively. When compared among the five 1D-CNN networks, the LinkNet model yielded the lowest RRMSE temporal (~0.22) and RRMSE spectral (~0.18) values. Finally, when compared to the ground truth EEG, the MultiResUNet3+ model excels in producing the nearest average power ratios for the delta band. However, the LinkNet model performed best in the theta, beta, and gamma band power ratio, whereas the FPN model showed superior performance in the alpha band (Table 9).

Discussion
The limitations of traditional single-stage and two-stage signal processing-based methods for denoising EEG signals, including low correlation coefficient, potential loss of crucial neural information, poor performance in dynamic situations, etc., are mentioned in the introduction section of this study. Although a few deep learning-based approaches have been presented for EEG denoising to remedy these drawbacks, with notable performance gains over signal processing-based methods, these DL-based approaches still have some shortcomings, among which lackings in model robustness, reliability, and generalizability are the key.
To address these limitations, five distinct DL models are utilized in this extensive study, including the proposed novel MultiResUNet3+ and four additional 1D-CNN models (FPN, UNet, MCGUNet, and LinkNet) to denoise EOG/EMG/simultaneous EOG and EMG artifacts from corrupted EEG. A publicly available dataset, namely EEGdenoiseNet, is deemed suitable for this work, as it featured pre-processed, well-structured EEG segments, including 4514 clean EEG segments, 3400 pure electrooculogram segments, and 5598 clean electromyogram segments. We semi-synthetically generated a large set of noisy EEG segments using linear mixing techniques where these clean EOG and EMG segments were used as noisy signals and clean EEG segments as ground truth to train the DL-based models. This process provided a broad range of signal types for the DL models to learn from. Furthermore, due to linear mixing, ground truth EEG segments were also available to provide a reliable reference for evaluating the performance of the models. To prevent any data leaking issues between the train and test set, the clean EEG, EOG, and EMG segments were used only once during the semi-synthetic data generation process.
Further, the quantitative measurement of five performance metrics (Experiment B) for all the five deep learning networks (FPN, UNet, MCGUNet, LinkNet, and the proposed novel MultiResUNet3+) showed that it is possible to denoise EEG signal artifacts using DLbased techniques and has a great potential to eliminate multiple artifacts simultaneously from EEG signals by using robust DL models. It should be added here that, for DL networks, high-frequency artifacts, such as EMG artifacts and simultaneous EOG and EMG artifacts, are more challenging to handle than low-frequency artifacts, such as EOG artifacts. The F-principle of neural networks can be used to explain this occurrence [66], which states that DL networks frequently learn low-frequency information at the beginning of training and high-frequency information as training iterations rise. A similar phenomenon is also observed in this study. From our extensive experiments, we have observed that removing EOG artifacts (low-frequency noise) from corrupted EEG is less challenging than denoising EMG-contaminated (high-frequency noise) EEG and simultaneous EOG and EMG-contaminated (high-frequency noise) EEG. For instance, a higher temporal percentage reduction in EOG artifacts (94.82%) is found than the temporal percentage reduction in EMG artifacts (89.59%). Similarly, a higher spectral percentage reduction in EOG artifacts (92.84%) is computed than in EMG artifacts (83.21%). Although simultaneous EOG and EMG contamination in EEG is very unlikely in a real-life scenario and challenging to remove reliably, our proposed MultiResUNet3+ provided a staggering performance-clear evidence of the efficacy and superiority of our proposed model in denoising complex artifacts as well.
Although the DL-based denoising approaches need a vast amount of ground truth EEG data during the training phase, once the model is trained, it can be utilized reliably to eradicate artifacts from unseen EEG signals corrupted with physiological artifacts. Another benefit of DL models is the ability to handle complicated artifact mixes, such as nonlinear and stationary ones. In contrast to the traditional signal processing approaches, which typically linearly attenuate artifacts, DL models can directly learn the fundamental pattern of neural activities from training data in the hidden space and then synthesize/predict the clean EEG from corrupted EEG. Therefore, DL-based techniques perform better than conventional methods in removing physiological noises from contaminated EEG.
It is indeed worth mentioning some constraints of this study. Even though the semisynthetic EEG dataset generated in this study includes a very high number of clean EEG, EOG, and EMG segments, the lesser variability in EEG type and lack of diversity in artifacts type is a matter of concern since EEG data can be recorded when the subject is at rest or while doing various tasks. Moreover, artifacts in EEG recordings are not just limited to ocular and myogenic. Motion artifacts, one of the inevitable sources of non-physiological noise, become dominant while EEG data are captured via wearable sensors. Therefore, a more diverse and dynamic dataset curation is necessary, which will help the DL-based models to be more efficient in removing physiological and non-physiological artifacts from corrupted EEG signals.

Conclusions
The elimination of artifacts is a crucial aspect of EEG data analysis. In this paper, we presented a novel 1D-CNN model, i.e., MultiResUNet3+ for EEG denoising, and demonstrated its superiority over four other 1D-CNN models, namely FPN, UNet, MCGUNet, and LinkNet. Apart from proposing a novel 1D-CNN model for denoising EEG reliably, we also introduced a set of benchmark metrics to facilitate the quantitative evaluation of DL-based EEG denoising models' adeptness. The proposed MultiResUNet3+ model reduced the EOG artifacts from EOG-corrupted EEG by 94.82% and 92.84% in the time and frequency domains, respectively. Moreover, the nearest average power ratio for the theta, beta, and gamma bands (0.5295, 0.076, 0.0197) compared with the ground truth EEG (0.5184, 0.0749, 0.0195) while removing the EOG artifact was produced by MultiRe-sUNet3+, meaning our proposed model is capable of denoising EEG in those bands most accurately. For EMG artifacts removal from contaminated EEG, our proposed novel Mul-tiResUNet3+ performed best by reducing 83.21% artifacts in the frequency domain and a neck-to-neck 89.33% artifacts reduction in the time domain (the best is 89.59%, produced by UNet). While denoising EMG-contaminated EEG segments, MultiResUNet3+ had the lowest spectral RRMSE (0.1893) as well. Once EMG artifacts were removed from the corrupted EEG segments, the proposed MultResUNet3+ provided the nearest average power ratio in delta, beta, and gamma bands (0.4453, 0.0658, and 0.0165, respectively) compared with the ground truth EEG (0.4421, 0.0756, and 0.0197, respectively). In denoising, simultaneous EOG and EMG from corrupted EEG, three 1D-CNN models, namely UNet, LinkNet, and novel MultiResUNet3+, performed neck-to-neck, which is evident from the computed performance evaluation metrics. Overall, the results obtained from the proposed MultiResUNet3+ clearly showed its robustness and reliability in denoising physiological artifacts (EOG/EMG/simultaneous EOG and EMG) from corrupted EEG signals. Our findings showed that DL techniques could potentially eliminate EOG/EMG artifacts from EEG data even at high noise levels. Following a similar framework, the proposed MultiResUNet3+ segmentation network may be used for multi-channel EEG noise reduction by applying the proposed model to multi-channel EEG signals separately, which should facilitate EEG-based BCI applications.