Estimation of Number of Levels of Scaling the Principal Components in Denoising EEG Signals

Electroencephalogram (EEG) is basically a standard method for investigating the brain’s electrical action in diverse psychological and pathological states. Investigation of Electroencephalogram (EEG) signal is a tough task due to the occurrence of different artifacts such as Ocular Artifacts (OA) and Electromyogram. By and large EEG signals falls in the range of DC to 60 Hz and amplitude of 1-5 μv. Ocular artifacts do have the similar statistical properties of EEG signals, often interfere with EEG signal, thereby making the analysis of EEG signals more complex[1]. In this research paper, Principal Component Analysis is employed in denoising the EEG signals. This paper explains up to what level the scaling of principal components have to be done. This paper explains the number of levels of scaling the principal components to get the high quality EEG signal. The work has been carried out on different data sets and later estimated the SNR.

The study of Electroencephalogram is very much helpful in diagnosing different disorders of the nervous system. EEG is the electrical action recorded from the scalp surface, which is picked up by conductive media and electrodes [1][2][3] . EEG has been performing a vital role in investigating brain activities in clinical application and scientific research for several years [4][5][6] . The EEG signals can be contaminated by various artifacts, of which the major noise source is ocular artifact, which includes Eye-movement and eye-blink's 7 . However, artifacts are the major enemies of high-class EEG signals. The mixing up of these ocular artifacts with the EEG signal at the time of recording causes the problems in the accurate estimation of EEG signal. These artifacts will plunge into either of the 2 categories namely, technical and physiological artifacts. Power line noise 50/60Hz falls into technical artifact category while the artifacts that crop up because of ocular(EOG), heart(ECG) and muscular activity(EMG) falls into physiological artifacts category respectively 8 .
Regression in the time domain and frequency domain 9-11 methods were proposed in removing eye blinks artifacts. These methods require a reliable reference channel. This channel can be contaminated by EEG. So, EEG has to be removed from the reference channel by regression techniques. Hence, the regression methods are not the finest to remove EOG artifacts.
Principal Component Analysis is one of the available techniques for extracting the information from the data and has found applications in a wide range of disciplines 12 . PCA was introduced by Pearson in 1901 13 and developed by Hotelling 14 in the year 1933. In this research paper, up to how many levels the principal components have to be scaled for obtaining the better denoised EEG signal is elaborated using MSPCA and WAVELETS 15 and later estimated the SNR.

Principal component analysis
Principal Component Analysis (PCA) can be applied to EEG data that contains a large number of measured variables to develop into a smaller number of artificial variables called principal components(PC). Obtaining a smaller number of variables from a large number of measured variables is to reduce the redundancy in the measured variables. Here, redundancy means some of the variables in the measured data are correlated with one another,because they are measuring the same construct.The main idea of PCA is to reduce the dimensionality of the data set, as the data set consists of a large number of interrelated variables, and trying to retain as much as possible variation present in the data set.
These principal components are uncorrelated,orthogonal and ordered in such a way that the first few components retain most variation present in all of the original variables. PCA is performed by eigen value decomposition of data covariance matrix.This is usually done after mean centering the data for each attribute.
If the variables in a data set are already uncorrelated, PCA is of no value. In addition to being uncorrelated, the principal components are orthogonal and are ordered in terms of the variability they represent. That is, the first principal component represents, for a single dimension (i.e., variable), the greatest amount of variability in the original data set. Each succeeding orthogonal component accounts for as much of the remaining variability as possible.
In other words, Principal component analysis (PCA) is a multivariate data analysis procedure that transforms a set of 'n' correlated variables, X = (x 1 , x 2 ... x n ,), into a set of uncorrelated variables called principal components (p 1 , p 2 , ..., p n ). The first principal component accounts for most of the variability in the data, while each of the succeeding components in turn account for the highest amount of the remaining variability. Each principal component is a linear combination of the The variance of the ith principal component is given by PCA makes one stringent but powerful assumption, linearity. This assumption simplifies the problem by restricting the number of variables from the measured data. Hence PCA is used to reexpress the data, which is a linear combination of Fig. 1(b). Original and Artifact Removed EEG Signal of Data Set 2 using First Scale PCA original basis and which is explained here in terms of linear algebra.
Consider an original data set X, which is an mxn matrix, where 'm' corresponds to the number of measurement types and 'n' is the number of samples. The main goal is to find an orthonormal matrix 'P' in Y=PX such that C Y =1/n(YY T ) is a diagonal matrix. The matrix 'P' transforms 'X' into 'Y'. The rows of orthonormal matrix 'P' represent principal components of original data set 'X'.

First level PCA
Two data sets, namely, EEG data set1(X1)  . 2(a). Original and Artifact Removed EEG Signal of data set1 with MSPCA and EEG data set2(X2), each of size 1x1000, were collected from physionet.org website 16 . These two data sets were down sampled by a factor of 2. This reduces the size of each data set to 1x500. Each data set is normalized using the following formula: X=(X-mean(X))/std(X) where, mean(X) is mean of X std(X) is standard deviation of X The mean of each data set is calculated using the following formula: Where, n corresponds to number of samples in the data set X.
The variance of each data set is calculated using the following formula: After obtaining the normalized data sets, a noise signal (EOG signal collected from physionet. org website), whose variance is of 0.4 and of length 500, is added to the two data sets. This results in two noisy data sets. These two noisy data sets are of size 1x500 are converted into a column vector.
The size of the column vector will be equal to 500x2.This column vector is treated as noisy EEG signal. For each column of data matrix wavelet decomposition is done to a level of 6 using Fig. 2 (b). Original and Artifact Removed EEG Signal of data set2 with MSPCA  Table 1 Principal Components (1.12188 and 0.21859) corresponding to level 8 are the number of retained principal components for final PCA after wavelet reconstruction.
From level 8, it is observed that the original data in two dimensional spaces is reduced to one dimension and shown below for ready reference. LEVEL 8 -0.32884 0 -0.94439 0 First score PC Second PC score Using these principal components one can reconstruct the denoised version of the input matrix X. The denoised versions of the input matrix X i.e., EEG data set1 and EEG data set2 are shown in the Fig. 1 (a) and Fig. 1(b) respectively.
The quality of column reconstructions is estimated by the relative mean square error are 22.1904933458124% and 73.9497428008459%, not closer to 100%.
Since the quality of column reconstructions after first level PCA are not closer to 100%, hence the level of scaling principal components is taken to next level 18 . Hence retaining of principal components will be decided based on the quality of reconstruction of columns which is measured by relative mean square error.

Second level PCA
The simplified input matrix X, which was obtained from the first scale of PCA, is again decomposed to a level of 6 using sym8 wavelet. Now the wavelet coefficients obtained after the wavelet decomposition are thresholded using Heursure thresholding. For these wavelet coefficients PCA is performed and selected the significant principal components. Using these principal components one can reconstruct the much more denoised input matrix X. The quality of reconstruction of the columns estimated after the second time processing of the input matrix x are close to 100% and are 99.9981% and 99.9991%.
The principal components and Principal Component variances vectors of the two data sets obtained after second scale PCA and wavelet denoising are shown in the

RESULTS
The results obtained after performing MSPCA on different data sets were tabulated in Table 3 and also compared with the previous results 18 .

CONCLUSION
The MSPCA is providing better SNR as the Relative Mean Square Error(RMSE) of the columns is closure to 100%. Hence, it is important to check the Relative Mean Square Error (RMSE) of the columns before reconstructing the Denoised EEG signal and estimating the SNR.

Conflict of Interest
There is no conflict of interest.

Funding Source
Not applicable.

Ethics Approval
This data were downloaded from Physionet data base which is an open source data base and duly acknowledged in the text.