Identification and Removal of Physiological Artifacts From Electroencephalogram Signals: A Review

Electroencephalogram (EEG), boasting the advantages of portability, low cost, and high-temporal resolution, is a non-invasive brain-imaging modality that can be used to measure different brain states. However, EEG recordings are always contaminated with artifacts from different sources other than neurons, which renders EEG data analysis more difficult, and which potentially results in misleading findings. Therefore, it is essential for many medical and practical applications to remove these artifacts in the preprocessing stage before analyzing EEG data. In the last thirty years, various methods have been developed to remove different types of artifacts from contaminated EEG data; still though, there is no standard method that can be used optimally, and therefore, the research remains attractive as well as challenging. This paper presents an extensive overview of the existing methods for ocular, muscle, and cardiac artifact identification and removal with their comparative advantages and limitations. We also reviewed the schemes developed for validating the performances of algorithms with simulated and real EEG data. In future studies, researchers should focus not only on the combining of different methods with multiple processing stages for efficient removal of artifactual interferences but also on the development of standard criteria for validation of recorded EEG signals.

brain-computer interface (BCI) devices [16]. Furthermore, artifacts can also affect diagnosis and analysis in clinical research such as on sleep disorders, Alzheimer disease, and schizophrenia [6], [7]. It is therefore mandatory, in either clinical or practical research, to deal with these artifacts prior to the analysis of EEG signals. To do so, a method is required that not only can remove artifacts efficiently but at the same time, can preserve the true, distortion-free neuronal activity present in EEG signals.
For these purposes, several manual and automated methodologies have been developed and utilized. One straightforward approach is to record EEG with many appropriate precautions; but requiring this and achieving it are two very different things. Another commonly used technique is to remove the epochs from EEG data having extensive artifacts, though this also can cause the removal of useful EEG information. Alternatively, many semi-automatic and automatic methods have been developed to remove/reduce artifacts from EEG data [12]- [14], [17]- [29]. Generally speaking, these methods can be divided into two main categories: regression-based methods and blind source separation (BSS)-based methods. It is also very important to mention here that due to the diverse sources and characteristics of artifacts, most of the studies conducted thus far have considered the removal of only one type of artifact. However, recent studies showed more interest in removing multiple type of artifacts. Moreover, in the last few years, only a few new algorithms additionally to the classical regression and BSS approaches have been developed. Instead, researchers have focused on improving previous methods by combining different algorithms, by making algorithms automatic, and by using more appropriate performance metrics. However, to date, researchers in this area have not agreed on the optimal method for artifact removal that does not also distort the actual EEG signal [30]- [33].
In order to spur efforts in that direction, this paper presents a comprehensive review of the existing state-of-the-art techniques that have been used to remove/reduce artifacts from EEG data. First, we briefly discuss EEG signals and the kinds of artifacts present therein. Next, we survey the existing artifact removal techniques and their advantages and limitations. Then, we present the most commonly employed performance metrics, which are the most basic means of evaluating algorithm performance. Additionally, we briefly discuss the importance and implementation of artifact removal in practical and clinical applications. Finally, we conclude this review by discussing future directions and making recommendations. We believe that this review article can help researchers to choose more appropriate methods for their applications and to develop new methods to deal with artifacts.

II. BACKGROUND
In this section, we will endeavor to provide an overview of the characteristics of EEG signals and the different types of artifacts present in them.

A. EEG CHARACTERISTICS
A recorded EEG has a frequency somewhere within the 0.01 Hz -100 Hz range. The frequency content can be divided into five major bands known as delta, theta, alpha, beta, and gamma [34]. Details on the frequencies associated with these bands are provided in Table 1.

B. TYPES OF ARTIFACTS
Basic knowledge on the different types of artifacts is necessary in order to develop or select suitable algorithms for removal of artifacts from EEG signals. Broadly, artifacts in EEG can be classified into two types, physiological and nonphysiological [31]. Non-physiological artifacts include electrode displacement, interference from the environment, and movement artifacts. These artifacts can be reduced in number by proper subject instruction and experimental setup [30]. On the other hand, physiological artifacts include ocular artifacts, muscle artifacts and cardiac artifacts. In contrast to nonphysiological artifacts, removal or reduction of these artifacts requires the use of a suitable handling algorithm. Another obstacle in EEG signal processing, specifically in source localization and connectivity studies, is to tackle volume conduction artifcats i.e., the activity of single brain region can be recorded at multiple electrodes and activity of multiple brain regions can be recorded at single electrode [35]. In literature several studies developed techniques to deal with the problem of volume conduction and superposition. The relevant references [35]- [37] can be consulted for more details on this problem. Table 2 summarizes the types and origins of all commonly known artifacts. Since ocular, muscle, and TABLE 2. List of different types of artifacts and their origins in EEG signals. VOLUME 6, 2018 cardiac artifacts are extensively handled in the literature, in this review we will survey only the commonly used methods that deal with them. Figure 1 shows the contamination of different physiological artifacts present in EEG signals.

III. SURVEY OF ARTIFACT REMOVAL ALGORITHMS
This section provides a detailed overview of all the wellknown methods that are used to remove/reduce artifacts in EEG data. We will summarize the main steps involved in the processing of these methods and also highlight some of the advantages and limitations as well.

A. ARTIFACT AVOIDANCE
The most straightforward way to reduce artifacts in EEG signals is to avoid movements that can incur them. For example, with regard to ocular artifacts produced due to blinking and eye movements, experimentalists can instruct subjects to avoid unnecessary eye movements, blinks, body movements and to try to remain still as much as possible. However, achieving this seemingly simple solution can be difficult. For instance, a human has no control over his pulse; therefore reducing EEG artifacts is next to impossible by artifact avoidance. Moreover, it is very difficult, and in fact nextto impossible, to control eye movements and blinking for relatively long periods of time. Furthermore, this type of solution is often unrealistic with applications such as BCI.

B. ARTIFACT SEGMENT REJECTION
Another common solution used in early artifact removal studies was to remove all epochs that are highly affected by signals from non-neuronal sources. The most difficult part of this method is to identify artifactual epochs from large EEG datasets, as it requires much expertise in analysis of EEG data as well as a significant amount of time, making it unsuitable for applications like BCI. A major drawback of using this method, moreover, is the loss of important neuronal information present in artifactual epochs, which might lead to erroneous conclusions. In any case, due to the recent development of automatic artifact removal algorithms, the use of epoch rejection these days is not preferred.

1) REGRESSION METHODS
Regression algorithms are the most simple and most commonly used methods to remove artifactual contamination from EEG data [38]- [40]. To identify artifacts from EEG signals, one or more reference channels are used. Regression methods are based on a simple methodology entailing the subtraction of artifactual signals from EEG signals after estimation of artifact propagation coefficients [41]. These propagation coefficients can be estimated using measured reference signal for particular type of artifacts i.e electrooculography (EOG) signals for ocular artifacts and electrocardiography (ECG) signals for ECG artifacts. In case of ocular artifacts, these propagation coefficients can be calculated as follows [27] where α and β represent the propagation coefficients for vertical and horizontal EOG, respectively, and N and M represent the sample size for vertical and horizontal EOG, respectively. According to [27], samples with high vertical and horizontal EOG should be used to calculate these propagation coefficients. Finally, the corrected EEG can be obtained as where EEG out is the corrected EEG, and EEG con is the contaminated EEG. Due to their need for a reference channel, which limits their applications mainly to EOG and ECG, regression methods have been replaced by more enhanced methodologies [31], [32], [42]. Furthermore, while removing artifacts using EOG signals as reference, this method makes an invalid assumption, which is that the neuronal activity in EEG and EOG signals is uncorrelated [32], [43]. As a result, regression analysis eliminates, from EEG signals, the neuronal activity common to both EEG and EOG. Regression methods are computationally simple, but their outcomes are highly affected by bidirectional contamination [12], [21]. However, in more enhanced regression methods, this issue of bidirectional contamination is addressed. Filtering EOG signals with a low-pass filter is the most straight forward way to overcome this issue [28], [44], [45]. The argument used to validate this approach is that most of the high-frequency content in recorded EOG belongs to the neuronal activity, and that therefore, filtering that part will highly reduce the bidirectional contamination effect [44]. In literature, there is no consensus on the optimal low-pass filtering of EOG signals, and it is therefore an open problem for future research. Contrarily, some authors argue that all frequency bands are contaminated with neuronal activity [46]. However, regression methods  are still used as the gold standard for comparison of the performances of all newly developed methods. Figure 2 and Table 3 provides a schematic and a list of studies on the regression methods.

2) FILTERING ALGORITHMS
In this section, we will summarize different filtering approaches used to reduce/remove artifacts in EEG signals. A simple classical filtering approach can be used but only when dealing with a specific frequency band, for instance 50/60 Hz interferences. However, for efficient removal of major artifacts, alternative filtering techniques should be adopted.
The recorded contaminated EEG signal cEEG i ∈ 1×N is a combination of the true EEG signal tEEG i ∈ 1×N and artifactual contamination v i . Mathematically, where i represents the channel and N the sample size.   The purpose of filtering is to minimize the mean square error between output EEG oEEG i ∈ 1×N and the true EEG by estimating the optimal filtering parameter β, i.e., There are a number of filtering approaches available that can be used to deal with artifacts in EEG signals, though adaptive filtering is the most commonly employed.
Adaptive filtering assumes that there is no correlation between the true EEG signal and artifactual activities [27]. A reference signal is used to estimate the artifactual signal that is correlated with an artifact. Then, the estimated signal is subtracted from the recorded EEG signal to obtain the artifact-free EEG signal. Achieving the best results using adaptive filtering is highly dependent on the choice of the reference signal [31]. For instance, EOG signals can be used to remove ocular artifacts from EEG data [47] and/or ECG can be used to measure the reference signal that can be used to remove cardiac artifacts [48]. Finally, an optimization algorithm can be used to obtain an optimal set of parameters that best estimates the artifacts present in EEG signals. The least mean squares (LMS) algorithm is the most commonly employed adaptive algorithm for adjustment of a weight vector [49]. Another most commonly used algorithm is recursive least squares (RLS)-based adaptive filtering [47], [50]. RLS algorithms perform better than LMS-based filters but also incur high computational cost relative to LMS. Online implementation, no preprocessing/calibration and ease of use are the few advantages of adaptive filters, whereas the requirement of a reference signal using extra sensors is the limitation.
There are other filters, such as Kalman, Wiener and Bayes filters that can be used for artifact removal; however, these techniques have not been explored extensively in the literature of EEG artifact removal [32], [51]- [54]. Figure 3 illustrates a general schematic of the filtering approaches to the removal of artifacts from EEG signals. Table 3 shows the list of studies on filtering techniques.

3) BLIND SOURCE SEPARATION
BSS is one of the most popular and widely used techniques for removal of artifacts from EEG data by separating source signals of neuronal activity from artifacts [30]- [32]. One of the major advantages of BSS is that it does not require any prior information (in some cases very limited information) about the mixing of different sources. Let X be multichannel EEG signals with linear mixing of sources S; then, mathematically, where A is the mixing matrix. BSS can be used to generate an un-mixing matrix W to separate the original sources where ∧ S is the estimation of the sources. Once all of the neuronal and artifactual sources are known, the latter can be removed to obtain artifact-free EEG. Figure 4 shows the general schematic of artifact removal using BSS algorithms.
There are many BSS algorithms developed to remove artifacts from EEG signals, including independent component analysis (ICA), principal component analysis (PCA), canonical correlation analysis (CCA), and morphological component analysis (MCA).

a: INDEPENDENT COMPONENT ANALYSIS
ICA is the most commonly employed BSS technique in EEG artifact removal studies [22], [23], [25], [81]- [83]. In general, ICA decomposes multichannel EEG data from different sources into independent components (ICs). ICA is applied under the assumption that the signals from different sources are independent and linearly mixed. Recently, ICA emerged as a valuable tool for removal of artifacts from EEG data, because it does not suffer the limitations that afflict parametric methods such as adaptive filtering. For instance, ICA does not require any prior information or additional reference channel for removal of artifacts. The effectiveness of ICA is based on the statistical independence of the sources and mixing matrix. ICA has shown promising results in removing artifacts from EEG data, even in cases where the neuronal and artifactual sources are not completely independent [84]. Since ICA is a statistical approach, the reliability of its results highly depends on the amount of data provided to the algorithm [32], [85]. To achieve the best results with ICA, the maximum amount of data should be 30634 VOLUME 6, 2018 used when the sources are reasonably spatially stationary. Different authors have suggested different amounts of data to be used for best results; for instance, [85] suggested the use of 10 sec of data, while [14] argued that the sample size should be several times the square of the number of channels. Contrastingly, a few authors have reported that AMUSE and SOBI work well with short durations of data as well, since they are based on minimization of the correlation between signals [32], [85].
Although the performance of ICA is promising, it should be employed with care [86]. Most of the ICA-based studies have focused extensively on the removal of artifacts from EEG signals [87], while the effects of the method on the neuronal part of the signal have been neglected [17]. Additionally, the selection of artifactual components has been performed by visualizing topographic maps and time series of ICs, and thus is highly dependent on the expertise of the researcher [88]. Usually manual identification of this sort leads to divergent results. However, in recent years, researchers have proposed different features that can be used to automatically identify artifactual components [23], [82], [88]- [92]. These automations have proved to be effective in terms of computational cost and artifact reduction, though the problem of the loss of neuronal information by completely rejecting artifactual ICs remains unaddressed. Another disadvantage of ICA is that it cannot be applied to a single channel (or a few channels), as it assumes that the number of channels must be equal to or greater than the number of sources. The complex iterative procedure of ICA is another drawback, as it limits its use in online/realtime applications. Many modifications of ICA have been proposed in the literature, for instance JADE [15], fast ICA, SOBI, InfoMax [93], constrained ICA [94], AMICA [95] and AMUSE [20]. In [96], the authors discuss fifteen different variants of ICA methods for removal of artifacts from EEG signals. Table 4 lists the studies that have utilized ICA algorithms.

b: PRINCIPAL COMPONENT ANALYSIS
PCA is a statistical method that converts time-domain observations of possibly correlated variables into a set of values of linearly uncorrelated variables using orthogonal transformation. These linearly uncorrelated variables are called principal components (PCs), which are less than or equal to the number of channels used in EEG recordings. The transformation is designed such that each PC has the highest variance possible under the constraint of being orthogonal to the preceding PC.
In EEG analysis, spatial distribution of eye activity was first determined using PCA in 1991 [97]; since that time, many authors have used PCA to remove artifacts from EEG data [19], [98]- [100]. It has notably been reported that PCA performs better then regression-based artifact removal [97]. The major drawback of PCA, though, is its assumption of orthogonality, which generally does not hold for neuronal activity and artifacts. Whenever the amplitude of the neuronal and artifactual activity is similar, then, PCA fails to determine the artifactual components [93], [101]. Extensions of PCA include robust PCA [102] and kernel PCA [103]. Even though PCA has performed better in removing certain types of artifacts, most researchers prefer alternative methods such as ICA [32]. Table 4 lists the studies using PCA algorithms.

c: CANONICAL CORRELATION ANALYSIS
CCA is a statistical method developed to investigate the underlying relationship between two datasets in terms of finding correlation between them. In literature, many studies showed the feasibility and potential of CCA for removing artifacts from EEG signals [104], [105]. CCA is used to find the basis vector for two sets of variables in such a way that the correlation between the projections of the variables onto the basis vector are mutually maximized. Let X (t) be the recorded multi-channel EEG signal, Y (t) be a temporally delayed version of the data such that Y (t) = X (t − 1), and their linear combination x = w T x X and y = w T y y. CCA finds the weight vectors w x and w y after removing mean of each row from X and Y , that maximize the correlation between x and y by solving problem [104] ρ where C xx and C yy are the auto-covariance matrices of X and Y respectively, and C xy is the cross-covariance matrix of X and Y. An eigenvalue problem can be obtained by setting the derivatives of equation (7) with respect to w x and w y to zero as follows where ρ is the canonical correlation coefficient. The components with minimum auto-correlation correspond most closely to artifacts. CCA, moreover, is a BSS method that uses second-order statistics with less computational cost than ICA [30]. Unlike ICA, CCA is used to determine components derived from their uncorrelated sources [31]. Additionally, CCA, unlike PCA and ICA, does not require the assumptions of orthogonality and Gaussian distributions. Previous artifact removal studies have demonstrated CCA's superior performance over ICA [26], [104], [106], [107]. CCA has been successfully applied to remove muscle artifacts from EEG signals, and has shown improved performance over ICA [26]. This might VOLUME 6, 2018 be due to the fact that muscle artifacts do not have stereotyped topography, and consequently, ICA does not separate muscle artifacts efficiently. Table 4 lists the studies using CCA algorithms.

d: MORPHOLOGICAL COMPONENT ANALYSIS
MCA is a method used to decompose a signal into components that have different morphological aspects. Each component is sparsely represented in an over-complete dictionary 30636 VOLUME 6, 2018 made up of different waveforms, and can be used to describe different source signals. A dictionary is a collection of waveforms or atoms, such as columns of wavelet, Fourier and Dirac basis [108]. A signal is sparse in if it can be represented using a linear combination of a few atoms only. By merging several complete dictionaries, an overcomplete dictionary is constructed. Although the signal representation is no longer unique, the class of signals that can be sparsely represented using the dictionary is much larger. MCA assumes that a signal S ∈ N can be represented as a linear combination of m morphological components [108], [109] represents a signal type that has different morphological structures. A morphological structure that is sparse in a particular dictionary (i) will generally not be sparse in other dictionaries. Therefore, (i) can play an important role in discriminating different signals contents. The problem of finding the sparsest representation can be formulated as Because this problem is inherently combinatorial, and therefore intractable, the basis pursuit method suggests the substitution of the l 0 -norm by the l 1 -norm that also promotes sparsity in the solutions. In EEG analysis, signals can be represented as a linear combinations of three morphological components using MCA theory [108]. For instance, the spikes in EEG signal can be represented by Dirac basis, background EEG and ERPs can be represented by discrete cosine transform basis, and artifcats having transient properties like ocular and muscle can be represented by Daubechies wavelet basis. MCA is used to remove ocular and muscle artifacts from EEG data, and has been reported to be a better method than stationary wavelet transform [108]- [110]. The major limitation of this method is that it always requires a database containing morphologies of different types of artifacts, and therefore, its performance is highly dependent on the available templates of artifacts. Table 4 lists the studies using MCA algorithms.

4) WAVELET TRANSFORM
Wavelet transform (WT) is a method that decomposes a timedomain EEG signal into specific time-frequency representations obtained by dilations and shifts of a unique function ψ called the mother wavelet [105]. WT is the inner product of the time-domain signal and basis wavelet function. When the signals are discrete, the discrete WT (DWT) can be applied, and a set of basis functions is defined on a dyadic grid in a time-scale plane as where j governs the amount of scaling and k represents the amount of time shifting. In the DWT algorithm, the discrete time-domain signal is decomposed into high-frequency or details components and lowfrequency or approximation components through successive low-pass and high-pass filters [32]. The step-wise process to remove artifacts is as follows [105]: 1. Decompose EEG signal into number of detailed components 2. Threshold details coefficients to denoise signals from artifacts 3. Reconstruct artifact-free EEG signal by removal of threshold components. WT is ideal for biomedical applications, due to its robustness and versatility. WT has been widely used to remove artifacts from EEG data [105], [139], [140]. Even though this method has been used as a valuable tool to denoise EEG signals on its own, recently many researchers have combined it with other methods for more efficient artifact removal. One major drawback of wavelet-based methods is that they cannot remove artifacts completely if the spectral properties of the measured signal overlap with the spectral properties of the artifacts [30], [32]. Table 5 list the studies using WT algorithms.

5) EMPIRICAL-MODE DECOMPOSITION
Empirical-mode decomposition (EMD) is a data-driven method that decomposes a time-domain signal into a set of intrinsic mode functions (IMFs) with the advantages of adaptivity and flexibility [141]. More precisely, each of these IMFs must satisfy the following two conditions [142]: i. In the whole dataset, the number of extrema and the number of zero crossings must be equal or differ at most by one. ii. At all points, the mean value of the envelopes defined by the local minima and local maxima must be zero. The procedure of the EMD method can be summarized as follows [142]: 1. Identify all local maxima and local minima of the given signal. 2. Interpolate between maxima to estimate the upper envelope and between minima to estimate the lower envelope. This can be done by using cubic spline interpolation. 3. Calculate the mean of the two envelopes and subtract it from the given signal. 4. Repeat steps 1-3 until the stopping criteria are fulfilled. The sifting process stops when the final residue r(t) is a constant, a monotonic function, or a function with only one maxima or one minima from which no more IMFs can VOLUME 6, 2018 where p is the total number of IMFs and d represents the IMFs. In general, EMD performs better than Fourier or WT because the basis of its decomposition is adaptively derived from data rather than manual settings. EMD has been successfully used to remove artifacts from EEG data [141], [143] and also in combination with other methods (See Section III-D.). Furthermore, EMD, as it is very sensitive to noise, has been modified to deal with modemixing complications. Enhanced EMD (EEMD) is developed that has the average number of IMFs from EMD as the optimal IMFs providing a noise-assisted data analysis method [144]. Table 5 lists the studies using EMD algorithms.

6) SIGNAL SPACE PROJECTION
Signal space projection (SSP) is a method in which a signaloptimized subspace is defined from measurement data and the data projected into the signal subspace [145]. This method can be used to improve the signal-to-noise-ratio and source localization of EEG and MEG signals [146], [147]. SSP relies on the assumption that the subspace of the neuronal signals is orthogonal or sufficiently different from the subspace of the artifactual activities. Generally, PCA is used to determine SSP of the artifactual data. The projection operator is then estimated using the strongest PCs. This operator can be estimated using data contaminated with very high artifacts elicited due to ECG or EOG. In MEG signal analysis, artifactual subspace can be constructed using data acquired in empty room to reduce environmental artifacts. In the past, few studies successfully removed artifacts from EEG and MEG datasets using SSP algorithm. For instance, Nolte and Hämäläinen did a theoretical analysis of partial SSP algorithm to remove artifacts from MEG data [148]. Taulu and Hari developed an algorithm using SSP theory to remove artifacts from MEG data [149]. Recently, a study proposed a SSP based method to remove muscle artifacts from TMS-evoked EEG data [150]. References [151]- [154] can be visited for more detailed understanding of using SSP algorithms to remove artifacts from EEG and MEG data. SSP is also implemented in open source software's which can be used to visualize, analyze and remove artifacts from EEG and MEG datasets [155]- [157].

7) BEAMFORMING
In sensor array signal processing methods, beamforming or spatial filtering is a technique used for directional transmission or reception of signals [158]. Most commonly this technique has been widely used in communications and signal processing applications. Recently, these techniques have also been employed to analyze and process brain signals. Generally speaking, these techniques has been mainly used for source localization in MEG and EEG studies. Beamformers can be designed to pass the neuronal activities from a specific source while debilitate activities from all other external or internal sources [159] and references therein. However, beamforming based methods has been used to extract and remove artifacts from EEG and MEG signals using the same principal. For instance, Nazarpour et al. [159] developed a space-time-frequency (STF)-time/segment modelling and beamforming based methodology to remove eye blink artifacts from EEG data. Another study used beamformers to reject artifacts in simultaneous EEG-fMRI recording [160]. Hipp and Siegel showed that beamformers based analysis not only map the EEG signal to the cortical space of interest, but also efficiently remove muscle artifacts from signals [161].   In another study, beamforming based methodology was used to remove transcranial alternating current stimulation artifacts from MEG signals [162]. Recently, beamforming was combined with ICA to analyze the effects of microsaccadic artifacts in EEG signals [163]. They showed that beamforming can be used to validate the successful removal of artifacts from the data. For more detailed insight, we recommend readers to visit the references [36] and [164]- [170].

D. HYBRID METHODS
Since each method discussed earlier has advantages as well as limitations, recently, researchers have developed methods that combine two or more methods. The idea is to use methods' advantageous features to develop a modality that can completely remove artifacts from EEG signals. In this section, we will discuss some of these methods. Table 6 lists the studies using hybrid algorithms.

1) ADAPTIVE FILTERING AND BLIND SOURCE SEPARATION
Adaptive filtering and BSS (BSS: ICA) have been combined to develop this hybrid method. ICA is used to decompose EEG signals into ICs. Since it is a proven fact that identified artifactual ICs also contain weak neuronal signals, removing these ICs could cause distortion in EEG signals [17]. Hence, in this method, artifactual ICs are furthered processed by an adaptive filter to retain the neuronal information present in them. Klados et al. [34] developed a hybrid method by combing adaptive filtering and ICA for efficient removal of artifacts from EEG data. A similar method was developed in [24], combing adaptive filtering and BSS for removal of ocular artifacts. One of the limitations of these methods is that there is no criterion for automatic selection of artifactual components; accordingly, they apply adaptive filtering to all ICs, which can cause loss of neuronal information from non-artifactual ICs as well as increased computational cost. To overcome this issue, Mannan et al. [12], [13] developed a hybrid AF-BSS method that automatically identifies artifactual components and processes those ICs only to remove ocular artifacts from EEG data. A similar method was developed by combing the auto-regressive exogenous model with ICA for removal of ocular artifacts from EEG data [177].
General schematic of combined BSS and adaptive filtering to remove artifacts from EEG data is shown in Figure 5.

2) EMPIRICAL-MODE DECOMPOSITION AND BLIND SOURCE SEPARATION
EMD and BSS also have been combined to remove artifacts from EEG data. In this method, EMD is applied to EEG signals to obtain IMFs, and then the BSS method is applied to IMFs for detection and removal of artifactual components. In some studies, authors have reported on the EMD-BSS method [178], [179] whereas in others, BSS-EMD [180], [181]. The only difference is which method is applied first to EEG signals. Figure 6 illustrates the schematic of BSS-EMD algorithm.

3) WAVELET TRANSFORM AND BLIND SOURCE SEPARATION
In this method, the WT and BSS methods are combined for removal of artifacts from EEG data. Most commonly, this method is applied as follows [182], [183]: 1. Decomposition of EEG signals by ICA or CCA to obtain ICs or CCs 2. Decomposition of ICs or CCs by WT 3. Removal of artifactual components by thresholding 4. Reconstruction of artifact-free EEG signals. Another version combining WT and BSS applied WT as the first step and then BSS as the second. In the literature, WT's combination with either ICA [182] or CCA [183] has been reported. Figure 6 show the schematic of this algorithm.

4) ADAPTIVE FILTERING AND EMPIRICAL-MODE DECOMPOSITION
This hybrid method is based on the combination of adaptive filtering and EMD. A step-wise procedure of this hybrid method is as follows [184] 1. Decomposition of EEG signals to obtain IMFs using EMD. 2. Calculate frequency of each component using power spectrum density. 3. Find range of frequency from reference artifactual signal having non-significant portion of energy. 4. Construct a signal by combining components having frequencies greater than the upper limit of the above range and a signal with components having frequencies less than the upper limit of the above range. 5. Remove artifacts with adaptive filtering with recorded artifactual signals as reference input. 6. Reconstruct clean EEG by adding second signal from step 4 and cleaned signal from step 5. Removal of ECG artifacts using this method has been reported in [184].

5) ADAPTIVE FILTERING AND WAVELET TRANSFORM
Peng et al. [185] developed a method by combing adaptive filtering and WT. They used DWT and an RLS-based adaptive noise canceller to remove ocular artifacts from EEG data. This method can be applied as follows [185] 1. Wavelet decomposition of recorded EEG signals. 2. Thresholding wavelet coefficients. 3. Reconstruction of reference signal by inverse wavelet transform. 4. Apply adaptive filtering to contaminated EEG signals with reconstructed reference from step 3 as input. 5. Clean EEG signals.

IV. PERFORMANCE EVALUATION
Performance evaluation is a means of verifying or checking the ability of an algorithm to remove artifacts from EEG data. Since the underlying neuronal activity in recorded EEG data is unknown, it is a difficult task, therefore, to completely verify an algorithm's performance. In the literature, this problem is overcome through the use of simulated EEG data [12], [34]. In simulated EEG, clean signals (EEG signals recorded and analyzed with care so that there are no major artifacts) and artifacts are mixed using very simple as well as very complex techniques [22], [34], [224]. However, simulated EEG cannot achieve real contamination as in recorded EEG. Therefore, an algorithm should also be verified through the use of experimental EEG data.
In our opinion, an algorithm should go through a threestep verification procedure. First, evaluation of the algorithm should be done using simulated EEG signals. Next, selfrecorded EEG signals should be used to verify the effectiveness of the algorithm. Finally, real EEG signals available at verified EEG databases should be utilized in this regard. This verification procedure will testify as to the true performance, reliability, and reproducibility of any artifact removal approach.

A. EVALUATION METRICS FOR SIMULATED EEG DATA
In this section, we will overview most of the commonly used metrics to evaluate the performance of EEG data. One of the advantages of using a simulated EEG signal is that the true EEG signal is known and can be used to assess the performance of an algorithm.

1) MEAN SQUARE ERROR
In the time domain, the mean square error (MSE) can be used to assess the performance of an algorithm by calculating the differences between true EEG EEG in (i) ∈ 1×N and corrected EEG EEG out (i) ∈ 1×N . MSE can be calculated as [12] 2) ROOT MEAN SQUARE ERROR Root mean square error (RMSE) is another commonly employed metric to quantify the amount of information preserved by an algorithm. RMSE can be calculated as [204]

3) NORMALIZED MEAN SQUARE ERROR
Normalized mean square error (NMSE) is also used in EEG artifact removal studies as an evaluation metric. NMSE can be calculated as [206]

4) RELATIVE ERROR
Relative error (RE) is another time-domain metric that has been used in several studies to evaluate the effectiveness of algorithms in removing artifacts from EEG data. RE can be calculated as [13] where |.| denotes the norm calculations for a vector.

5) SIGNAL-TO-ARTIFACT RATIO
Signal-to-artifact ratio is the metric commonly used to evaluate improvements in the corrected EEG signal as compared with the contaminated EEG signal. Signal-to-artifact ratio for contaminated EEG EEG con (i) ∈ 1×N signals can be calculated as [225] where SAR B is the signal-to-artifact ratio before artifact removal, and EEG con = EEG in + noise. Signal-to-artifact VOLUME 6, 2018 ratio for corrected EEG can be calculated as where SAR A is the signal-to-artifact ratio after artifact removal. An effective artifact removal algorithm will remove all artifacts and will have higher SAR A values; consequently, SAR A > SAR B . The gain in signal-to-artifact ratio γ can be calculated as the γ value being positive if the signal-to-artifact ratio is improved, negative if decreased, and zero if there is no improvement.

6) MUTUAL INFORMATION
The amount of mutual information (MI) between EEG corrected by an artifact removal algorithm and true EEG can be calculated to analyze the effectiveness of an algorithm in extracting true EEG signals from contaminated EEG signals. Mathematically, MI can be calculated as [12] where f (a, b) is the joint pdf and f (a) and f (b) are the marginal pdfs. Corrected EEG and true EEG signals are closely related if and only if the MI between them is large.

7) MEAN ABSOLUTE ERROR
In the frequency domain, mean absolute error (MAE), which can be used as an evaluation metric to measure the distortion in different frequency bands, can be calculated as [12] where P denotes the power spectrum density.

B. EVALUATION METRICS FOR REAL EEG DATA
Since the underlying true EEG is unknown in experimental EEG signals, there is no consensus between researchers on the validation of artifact removal techniques when applied to real EEG signals. However, a number of researchers have proposed schemes for verification and validation of algorithms [32], [226]- [231]. For instance, Croft et al. [229] developed a scheme based on correlation of reconstructed EEG and the EOG reference channel and ERP consistency as associated with eye movements in EOG channels. However, this validation has two limitations, which are its dependence on the recorded EOG and the use of an entire epoch for ''standard deviation validation'' that includes irrelevant data. Pham et al. [227] addressed these limitations and proposed a revised improved version of the validation scheme.
It is important to mention here that those validation schemes are only for ocular artifact correction. Another attempt in this regard was made by McMenamin et al. [228] for muscle artifacts. They proposed to evaluate whether a method successfully removes/reduces artifacts (its sensitivity) and whether it preserves neuronal signals (its specificity) using a region of interest. Although this is an attractive approach, its implementation is not easy.
Finally, experts in EEG signal analysis have been called to visually inspect the outcomes of artifact removal algorithms by inspecting factors such as time series and frequency spectrum before and after the removal process. The limitation of this validation is that it is highly dependent on the expertise of the researcher in providing indications of whether the artifact removal algorithm improved or decreased the quality of the EEG signal. Several authors have used this scheme to validate and compare the performance of their algorithms with others, for instance [12], [13], [26]- [28], [107], [227], [232].

V. ARTIFACT REMOVAL IN EEG APPLICATIONS
Although the focus of this article is to review the most commonly used artifact removal algorithms for physiological artifacts in EEG signals, however, it will be beneficial to briefly describe application-based studies specifically for BCI and high-density EEG. It is widely accepted within the BCI research community that in any BCI system, neurological phenomena are the only source of control. Artifacts, unwanted electrical signals that arise from sources other than the brain, can interfere with neurological phenomena. Such artifacts might alter the characteristics of neurological phenomena or even be mistakenly used as the source(s) of control in BCI systems [16]. If not removed, these artifacts could, as indicated above, be mistakenly used to control the BCI system, which is the most significant artifact-related problem [233]. As failing to deal with artifacts can result in deterioration of BCI system performance during practical applications, it is necessary to develop automatic methods to handle artifacts or to design BCI systems robust to them. Bashashati et al. [234] showed that dealing with eye artifacts in EEG data can enhance the performance of a self-paced BCI system. Erfanian and Mahmoudi [73] used recurrent neural networks based adaptive filtering to automatically suppress ocular artifacts for improved EEG-based BCI performance. Recently, Yong et al. [172] combined stationary wavelet analysis with adaptive thresholding to automatically remove ocular artifacts from EEG data in an EEG-and eye-tracker-based self-paced BCI system. There method is independent of EOG and can be used for real-time processing. In another study, wavelet decomposition and ICA were combined to remove artifacts from EEG data for BCI applications [203]. This method was termed as FORCe and does not require any additional reference channels like EOG or ECG. More recently, a study developed a real-time methodology to detect and remove blinking artifcats using digital filtering with an automatic thresholding algorithm [235]. Another study developed an adaptive noise cancelling scheme using H-infinity filtering for removing ocular artifacts and signal drifts. They showed that adaptive filtering based artifact removal can enhance the decoding accuracy of brain-machine interfaces [77]. Zou et al. [129] developed an ICA based method in which hierarchal clustering of features extracted from ICs is proposed to remove physiological and non-physiological artifacts from EEG data for BCI applications. In a recent study, BSS algorithms were used to remove eye blink artifacts for online processing of the EEG signals [138]. Although they have not shown the performance of their algorithm for BCI application, but the online removal of artifacts can be used as a guiding tool for making BSS algorithms useable with BCI applications in future research. In a more recent study, a novel method termed as filter-bank artifact rejection algorithm was developed for real-time removal of artifacts from EEG signals [236]. This method divides EEG signal into different frequency band, extract features and use machine learning to remove artifacts. Another advantage of this method is that it can be implemented only with few channels or even with one channel EEG data. Results of their algorithm showed that this algorithm outperformed FASTER [224]. The relevant references [16], [80], [170], [187], [237]- [240] and references therein could be visited for deep insight on artifact removal in BCI application. Moreover, high-density EEG is another important and recent application for getting more insights into brain functionality during real-world activities [241]. Few studies also developed algorithms to remove artifacts to analyze highdensity EEG data. For instance, a study developed a modified ICA algorithm in which a subset of channels was randomly selected and decomposed with ICA algorithm [132]. Subsequently, an artifact relevance index was calculated by template matching scheme. They showed that their method can successfully remove blinking artifacts from high-density EEG data. More recently, Tamburro et al. [222] developed a combined ICA-SVM method to identify and remove all physiological artifacts from high-density EEG. Another study develop an automatic processing pipeline which uses wICA with automatic ICs rejection for removing artifacts [223]. They showed that their scheme can successfully remove artifacts from high-density EEG signals. We suggest that readers consult the applicable references [211], [242]- [245] for more details on the analysis and removal of artifacts from highdensity EEG signals. Furthermore, artifacts can also affect diagnosis and analysis in clinical research such as on sleep disorders, Alzheimer disease, and schizophrenia [6]- [9]. It is therefore mandatory, in either clinical research or practical applications, to deal with these artifacts prior to the analysis of EEG signals.

VI. DISCUSSION
EEG is the most commonly utilized brain-imaging device in medical and application-based research. The major issue of EEG is that it is always contaminated with artifacts from different sources such as eyes, muscles, cardiac noise, electrode misplacement, and movements in the environment [31]. It is proven that these artifacts can alter the results of applications such as BCI [16], high-density EEG [222] and disease diagnosis [6]. It is therefore essential to remove these artifacts before analyzing EEG data for the final goal of the application. Figure 7 shows the number of artifact removal research articles that used the described methods from 1991-2018. Research on artifact removal has almost monotonically increased in number each five year, as it can be seen in Figure 7. This trend indicates that physiological artifact removal from EEG signals is still an important and challenging research topic. In this paper, we reviewed most of the commonly employed algorithms dealing with physiological artifacts in EEG signals. Figure 8 provides the pie chart showing the percentages of the number of articles published using various algorithms. In single method studies, ICA (24%) is the most highly used algorithm for removing artifacts. Overall, due to the high effectiveness, most of the studies developed and implemented hybrid algorithms (33%). Removal performance, manual/automatic processing, offline/online/real-time implementations, single/multi-channel signals, reference channel requirements, and robustness can be considered as important metrics to select and compare each artifact removal algorithm. BSS algorithms, especially ICA, are the most frequently used methods for removal of artifacts from EEG data, due to the fact that they are implementable without the need of any reference signal [31], but they also suffer with some disadvantages and limitations. For instance, ICA on its own cannot automatically identify artifactual ICs to be removed from data. It requires visual expertise to accurately remove artifacts from signals and large amount of time [12], [82], [88]. However, many recent studies combined ICA with other FIGURE 8. The percentage of published articles using each algorithm discussed in this paper. VOLUME 6, 2018 statistical tools to automatically classify artifactual components [12], [23], [92], [115], [198], [216], [222] (see Table 4). Also, it has been proven that artifactual ICs also include leaked neuronal activity and removing these ICs cause considerable amount of data loss [17]. Requirement of large amount of data and large number of channels are the few limitations of ICA [32], [85], however, recent studies tried to overcome these issues [221], but these issues need more attention in future studies. PCA on the other hand assumes the orthogonality of activity and artifactual signals which does not hold whenever both have same amplitudes, and consequently PCA fails to split artifactual activities from EEG data [93], [101]. Although MCA has recently been used for artifact removal, but it has a drawback that it always require a morphology database of artifacts [108], [109]. In contrast, CCA is comparatively fast and does not pose conditions like Gaussianity, orthogonality and pre-defined database like ICA, PCA and MCA dose [26], [30]- [32], therefore should be explored more for all type of artifacts in future research. Overall, BSS algorithms can be used to deal with all types of artifacts present in EEG signals without the need of any extra reference signals. Alternative to BSS, regression/filtering methods has the limitation of requiring particular reference signals to remove particular types of artifacts from EEG data [32]. Furthermore, regression methods are highly effected by bidirectional contamination which cause to remove common neuronal activity from EEG signals [12], [34]. But few studies suggested that low-pass filtering of EOG signal can reduce the bidirectional effect [28], [44], [45]. However, simple, fast, no preprocessing and online/real-time implementation for BCI-type applications are few advantages of regression and filtering algorithms. WTs are proven to be ideal for biomedical applications due to their robustness and versatility, but they fail to remove artifacts whenever the spectral properties of artifacts and neuronal activities overlapped [31], [32]. EMD method is suffered by the limitation of mode-mixing but it has the advantages of adaptivity, robustness and flexibility [144]. It can be concluded from the above discussion that every method has advantages as well as disadvantages and limitations. To this end, few studies combined two or more methods such that the combination method can ensure advantages of each method to be maximized and drawbacks to be minimized. The idea of combining different methods can be used to deal with the problems faced by classic algorithms in EEG signal analysis. Recently, a few researchers who combined two or more methods to remove/reduce artifacts from EEG data have claimed that the combined methods can perform better than single algorithms [12], [17], [178], [197] (see Table 6). Also, it can be seen from Figure 9 that recently researchers showed very high interest in developing and implementing hybrid algorithms as compared to single methods. Most algorithms though, regardless of being used as a single modality or in a combined way, deal only with one type of artifact, which in fact limits their utility to particular applications (Table 3- 6). Figure 10 shows the pie chart describing the percentages of articles published dealing with  single artifact (67%) and multiple artifacts (33%). However, recent studies showed more interest in developing algorithms for processing multiple artifacts (Figure 10b). Furthermore, there is no standard validation rule applicable to algorithms that remove artifacts from real EEG data [30], [32]. Next, we will discuss and compare methods for removal of particular types of artifacts from EEG data.
Ocular/EOG artifacts are EEG signal contaminations due to eye movements and blinks, and are always present in EEG signals [59]. Due to this, ocular artifacts have been extensively treated by many researchers in the literature (Table 3-6). In early studies, it is possible to measure reference channel signals for ocular artifacts; therefore, regression/filtering methods have been the most commonly used for removal of such artifacts from EEG data until early 90's [18], [21], [27], [28], [246], [247]. On the other hand, if there is no reference signal available, ICA is the most commonly employed algorithm to remove ocular artifacts [14], [15], [22], [111], [112]. Initially, artifactual ICs were identified by visual inspection of time series and topographies [111], [112], [116]; but later on, many researchers, in order to make the ICA procedure automatic, proposed the use of features based on the temporal and spatial 30644 VOLUME 6, 2018 properties of ICs [22], [23], [89], [115]. As it can be seen from Table 4, other variants of BSS algorithms such as PCA, CCA and MCA have not been used extensively to remove ocular artifacts from EEG signals; in fact, a few studies have used only PCA [19], [97], [100], [101]. Furthermore, in the literature, only a few authors have used WT and EMD to treat ocular interferences [139], [144], [174]. Since all of these methods have limitations, more recently, many researchers have combined different methods to remove ocular artifacts, their rationale being that the methods thus devised utilize only the advantageous features of each method and thus are more efficient in removing ocular artifacts from EEG data [13], [34], [142], [180], [188]. Unfortunately though, most of those studies have determined the efficacy of their methods by visual inspection on experimental data or by use of simulated EEG signals under different conditions and circumstances; therefore, it is very difficult to comment on which methods perform better than others. However, it is very easy and common to acquire EOG signals as a reference for ocular artifacts. Also, the pattern of ocular artifact is very consistent with specific dynamics and ICA can successfully decompose it into separate ICs. Therefore, in our opinion, adaptive filtering, ICA or their combination could be a good choice for removal of ocular interferences from EEG data, depending upon the specific application (i.e., availability of reference, offline/online/real-time, etc.). The relevant references [18], [21], [28], [45], [175], [229], [246], [247] can be consulted for more details on removal of ocular artifacts.
The presence of unwanted muscle activities in EEG signals is known as muscle/electromyography (EMG) artifact. Generally, it is more difficult to remove muscle artifacts from EEG data as compared with EOG artifacts, because the reference signal for muscle artifacts is rarely available [81]. Even if extra electrodes are used to measure the reference signals for muscle artifacts, it is ineffective, due to the activation of multiple muscles involved in their generation [33]. Therefore, regression methods cannot be used as effective muscle artifact removal tools. Even though ICA is successfully used to remove muscle artifacts [120], [130], it is, unlike the case of EOG artifact removal, very difficult to separate muscle artifacts in different ICs, due to the fact that these artifacts are superimposed onto some ICs [104], [113], [114], [232]. Therefore, disagreement exists in the literature as to whether ICA is an effective tool for removal of muscle interferences [33], [228], [230], [231]. Recently, CCA has been used to remove muscle artifacts, and the authors showed improved performance over ICA [26]. Three other studies also have reported the successful use of CCA to remove muscle interferences from EEG [104], [106], [107]. Furthermore, the combination of CCA and EMD also has been developed to efficiently remove muscle artifacts [178], [179], [205]. Despite the fact that there are many algorithms available for removal of EMG artifacts, there is as yet no standard method for dealing with muscle artifacts. Recording EMG signals for removing muscle artifacts is not as easy as in the case of ocular artifacts, therefore, adaptive filtering cannot be used as an optimal method for removing muscle artifacts. However, ICA and CCA have the capabilities of decomposing signals such that source signals for muscle artifacts can be identified and removed, and several studies showed the successful application of these methods in removing muscle artifacts [26], [104]- [106], [120], [178], [183], [231], [232]. Therefore, in our opinion, ICA, CCA and their combination with other methods could be good choices for removal of muscle artifacts, depending on the specific application. For more details on the removal of muscle artifacts from EEG signals, we suggest that readers consult the applicable references [33], [105], [230], [232].
Artifacts due to heart beat are known as cardiac/ ECG artifacts. In literature, ECG are the least treated artifacts as compared with EOG and EMG artifacts. One reason for this might be the specific temporal dynamics and timefrequency characterization of cardiac artifacts, which do not pose difficulties as great as those that ocular and muscle artifacts do. Also, it is possible to measure reference signals for cardiac artifacts with ECG. The earliest method to deal with ECG artifacts was ensemble average subtraction [57]. Since it is very common practice in clinical environment to measure ECG along with EEG, regression and filtering methods can be used to remove cardiac artifacts [64], [119]. ICA is reported to remove cardiac interferences from EEG signals by visually identifying ICs related to ECG activity [90], [119]. Furthermore, ICA is combined with wavelet transform to enhance the artifact removal process [197]. As discussed earlier, cardiac artifacts have specific dynamics; as such, they can be easily separable in different ICs, and therefore, in our opinion, ICA or methods combined with ICA could be good choices for dealing with cardiac artifacts in EEG.
Next, we will consider studies that deal with two or more types of artifacts. In the literature, ICA is the most common method to deal with multiple artifacts. In 2003, ICA was used for the first time to remove all types of artifacts from eighty EEG signals [15]. The authors showed the efficacy of their results by visual inspection and by analysis of correlation, frequency spectrum and isopotential maps. Since then, many authors have reported the successful use of ICA both manually and automatically to remove all three types of artifacts [89], [92], [127], [129]. Furthermore, a number of authors have proposed the use of features that make the ICA process automatic. In some of those studies, WT was combined with ICA to enhance the performance of the artifact removal process [17], [191], [198], [203], [216]. Other methods, for example BSS-WT [190], adaptive filtering and neural networks [192], [215], ICA and support vector machine [194], also have been reported to successfully remove all types of artifacts. Ocular and muscle artifacts have been treated using ICA [20], [22], [83], [111], [128], [137], WT [172], [173] and hybrid approaches [202], [206], [210] as well. On the other hand, ocular and cardiac artifacts have seen the least attention in the literature [48], VOLUME 6, 2018 [91], [101], [214]. Again, it is very difficult to compare the performances of different algorithms, since all of the pertinent studies have used different measures to validate their algorithms.
In light of the foregoing discussion, there is no single method that can be selected as the optimal choice for removal of all types of artifacts, due to their respective limitations. Although many combined methods have been developed to deal with single and multiple artifacts, still, there is no method with standard validation procedures specifically for experimental EEG data. To this extent, it is our future plan to perform an empirical evaluation of different methods for comprehensive analysis of the advantages and disadvantages/limitations of each method and in what ways the limitations can be conquered. Furthermore, development and implementation of artifact removal algorithms for online/real-time processing with single/few channels is the need of future BCI applications. Moreover, methods for highdensity EEG of real-life activities has great room for improvements. For instance, this could be beneficial for future EEG applications to use dry electrodes which can highly reduce the time required for preparing experiments with wetelectrode EEG systems. Finally, we conclude this review by recommending that researchers consider the following aspects in their future studies: i) improvement of different methods for factors such as automation, online/realtime implementation, requirement of reference channel, and computational cost for medical and practical applications, ii) development of hybrid techniques with multiple processing stages to deal with different types of artifacts, iii) development of methods for generation of simulated EEG signals that can truly replicate the effects of real EEG signals for validation purposes, iv) development of standard validation procedures to verify algorithms on real EEG signals, viii) development and implementation of artifact removal algorithm for dry electrode EEG signals.

VII. CONCLUSION
EEG, a portable brain-imaging device, is always contaminated with artifacts from different sources, which artifacts can alter results. In the past few years, many researchers have focused on developing methods to deal with the removal of artifacts from EEG data, which removal remains an attractive research topic. In this paper, we presented an extensive review of the many existing methods for physiological artifact identification and removal along with a comparison of their advantages and limitations. We also provided an overview of the most commonly used metrics to verify an algorithm for simulated and experimental EEG data. Although there are methods that can be used for particular types of artifacts in a particular scenario, to date there is no single method that can be used optimally to remove artifacts from EEG data. In future studies, researchers should focus not only on combining different methods with multiple processing stages for efficient removal of artifactual interferences but also on developing standard criteria for validation of recorded EEG signals.