Task-induced frequency modulation features for brain-computer interfacing

Objective. Task-induced amplitude modulation of neural oscillations is routinely used in brain-computer interfaces (BCIs) for decoding subjects’ intents, and underlies some of the most robust and common methods in the field, such as common spatial patterns and Riemannian geometry. While there has been some interest in phase-related features for classification, both techniques usually presuppose that the frequencies of neural oscillations remain stable across various tasks. We investigate here whether features based on task-induced modulation of the frequency of neural oscillations enable decoding of subjects’ intents with an accuracy comparable to task-induced amplitude modulation. Approach. We compare cross-validated classification accuracies using the amplitude and frequency modulated features, as well as a joint feature space, across subjects in various paradigms and pre-processing conditions. We show results with a motor imagery task, a cognitive task, and also preliminary results in patients with amyotrophic lateral sclerosis (ALS), as well as using common spatial patterns and Laplacian filtering. Main results. The frequency features alone do not significantly out-perform traditional amplitude modulation features, and in some cases perform significantly worse. However, across both tasks and pre-processing in healthy subjects the joint space significantly out-performs either the frequency or amplitude features alone. This result only does not hold for ALS patients, for whom the dataset is of insufficient size to draw any statistically significant conclusions. Significance. Task-induced frequency modulation is robust and straight forward to compute, and increases performance when added to standard amplitude modulation features across paradigms. This allows more information to be extracted from the EEG signal cheaply and can be used throughout the field of BCIs.

these neural oscillations, modern stimulus-independent BCIs have achieved their current performance [6][7][8][9]. However, we show in this paper that previously unconsidered properties of the EEG can easily and reliably be used to improve decoding performance.
The two major properties of an oscillation are amplitude and phase, both of which have long histories in the realm of BCIs. The easiest and most-used property of neural oscillations in the context of BCIs is the signal amplitude or power [10]. The average response of cortical ensembles to tasks, be they cognitive or motor, is an increase or decrease in synchrony of neighboring circuits [11], giving rise to a change in the measured power at those circuits' frequency of oscillation-which is a mathematically convenient property to calculate given an EEG signal. To borrow a term from signal processing, we designate this amplitude modulation (AM) as the time-domain amplitude is closely related to the power. After being computed, these powers can be used as features for standard machine learning algorithms. Over time, many fundamental methods in BCIs have been developed using task-related differences in spectral power such as CSP [12] and the more recent Riemannian approaches [13], both of which have led to substantial gains in performance.
Phase-related features, though more recently exploited, have also enjoyed great success. Mostly, this has happened through measures of phase synchrony, in which two channels are compared to see how often the phases are identical in both [14]. This value for a given time period is used as a feature for a classifier, and has been shown to be useful for BCIs both offline [15][16][17] and online [18]. Outside of identical phases, consistent phase differences [19] and the average instantaneous phase difference [20] have also been used and shown to add new information, allowing for more accurate classification of intentions.
As has been known for many years, however, BCIs based on these feature spaces simply do not work for some subset of even healthy individuals (for a comprehensive review, see [21]). Much work has been done on attempting to determine a priori who will not be able to use a rhythm-based BCI [22], but the existence of such people is simply a given. This is not even to consider various patient groups for whom a BCI is sometimes the only hope for communication with the world, such as amyotrophic lateral sclerosis (ALS) patients. Within just this group, traditional features have proved ineffective by the end stage of the disease despite over two decades of research [23].
One thing that ties phase and amplitude-based features together is that they are both restricted by one thing: the frequency. For all the methods above, a frequency range needs to be assumed to compute the AM or phase features. There is, unfortunately, no good range that works for all people. Differences in the spectral location of neural frequency bands have been attributed to many factors. In the particular case of the μ band, i.e. oscillations in sensorimotor areas that range from 8 to 13 Hz, it has been shown to vary with age [24], genetics [25,26], and psychological factors [27]. For the parietal α, Haegens et al [28] determined that even within an individual and a single recording session the peak frequency of the α band can vary by up to one Hertz [28], and is related to such factors as cognitive load [28,29]. These more transient changes in the location of neural peaks are nearly always dealt with as reasons to shift the spectral window for further processing, indeed the general recommendation for motor imagery is to always use a subject-specific μ frequency window [30,31].
Neurophysiological research has given us ample evidence to believe that neural frequency peaks shift both location and size in response to tasks as well as innate factors. As BCI researchers, our natural question is then to ask whether this shift can be quantified and used as a feature for classification. Outside the field of BCI, this average frequency feature has already been attempted in LFP recordings [32,33] as well as with EEG for non-BCI usage [34,35]. Within the field, this has been dealt with by two works. Wu et al [36] showed that the frequency location of the trialwise empirical modes can be a robust feature for classification, but they did not look at task-induced frequency shifts. Rather, they used an SSVEP paradigm to induce the frequency shifts externally. A paper by Medl et al [37] did look at intrinsic, task-related frequency shifts over 25 years ago, in a visionary work that considered the instantaneous frequency and envelope as possible features for a BCI. However this work was never expanded past an exploratory analysis, an expansion which we attempt here. As we believe that the location of the neural oscillations in the frequency domain may be a feature for classification, we rechristen this approach frequency modulation (FM), as it measures the change in location of spectral peak.
In this paper we review a basic method for extracting the average peak frequency for a given BCI trial and show that this feature can be used across BCI paradigms as well as frequency bands to increase accuracy when added to traditional bandpower-based feature spaces.

Related work
The average frequency of a trial is equivalent to the average slope of the unwrapped phases at every point. Phase features are well-described in the BCI literature; however, they are exclusively related to phase synchronicity, and have never been considered in this way. The closest in BCIs is [20] that looks at the instantaneous phase difference, but between two different channels. The approach of analytic CSP [38] takes advantage of the phase values at every point in time, but uses them in a similar way to the synchrony measures used above. While the Hilbert transform does allow them to look at instantaneous phase, they compute the imaginary covariance matrix, which explicitly ignores the time dependencies in the phase which make up the average frequency.
Outside of BCIs but still in the realm of neurophysiology, the instantaneous frequency has in the last decade begun to come into use. In invasive recordings the instantaneous frequency has been estimated by AR modelling and used in local field potentials [32,33]. In EEG, AR-based frequency estimation has been done in auditory ERP for clinical purposes [35] and also to detect sleep spindles [34]. Within the field, this has been dealt with by three works. Wu et al [36] showed that the frequency location of the trialwise empirical modes can be a robust feature for classification, but they did not look at task-induced frequency shifts. Rather, they used an SSVEP paradigm to induce the frequency shifts externally. Park et al [39] also use a variant of empirical mode decomposition and the Hilbert transform to show that a version of instantaneous frequency can be used in motor imagery as well. However, they only attempt the Hilbert transform on the intrinsic modes themselves; we here show that it is unnecessary to use the decomposition first. Last, a paper by Medl et al [37] did look at intrinsic, task-related frequency shifts over 25 years ago, in a visionary work that considered the instantaneous frequency and envelope as possible features for a BCI. However this work was never expanded past an exploratory analysis, an expansion which we attempt here.

Datasets
To test whether FM features are useful across experimental paradigms, we use two datasets from different paradigms. The first is a motor imagery paradigm: subjects were placed in front of a screen with a centrally displayed fixation cross. Each trial started with a pause of three seconds. A centrally displayed arrow then instructed subjects to initiate haptic motor imagery of either the left or right hand, as indicated by the arrow's direction. After a further seven seconds the arrow was removed from the screen, marking the end of the trial and informing subjects to cease motor imagery. Ten healthy subjects participated in the study (eight males, two females, 25.6 ± 2.5 years old). One subject had already participated twice in other motor imagery experiments while all others were naïve to motor imagery and BCIs. EEG data was recorded from 128 channels, placed according to the extended 10-20 system with electrode Cz as reference, and sampled at 500 Hz. BrainAmp amplifiers (BrainProducts, Gilching, Germany) with a temporal analog high-pass filter at 0.1 Hz were used for recording. A total of 150 trials per class (left/right hand motor imagery) per subject were recorded in pseudo randomized order, with no feedback provided to the subjects during the experiment.
The second paradigm is a cognitive paradigm introduced by Hohmann et al [40]: Eleven healthy subjects (6 female, mean age 28 ± 7.5 years) as well as five ALS patients (demographics and ALS-FRS-R scores given in table 1) were instructed to either activate self-referential memories by thinking of a positive memory, or to focus on a mental subtraction task. Trials were 35 s long with a 5.5 ± 0.5 pause in between. Each healthy subject completed 60 trials without feedback, while the number of trials varied for ALS patients based on their ability to participate. Patients managed between 30 and 40 trials. EEG was recorded from 124 channels placed according to the extended 10-20 system with identical sampling and amplification to the previous task.

Pre-processing
To investigate whether the usefulness of FM features is robust with respect to various pre-processing steps, we tested three different types of spatial filtering: No spatial filtering, Laplacian spatial filtering [42], and common spatial patterns (CSP) [43]. In order to limit the variance when estimating classification accuracy in section 2.4, we designed each pre-processing step to give us a small, two-dimensional feature space. For no spatial filtering, we chose two channels above the left and right motor cortices (C3 and C4) and over frontal and parietal areas (Fz and Pz) for the motor imagery and the cognitive paradigm, respectively. For spatial filtering, we chose to limit our analysis to the motor imagery dataset, as the literature on BCI performance and neurophysiological interpretations of spatial filtering are more extensive for this paradigm. We chose one data-dependent and one data-independent filtering method: Laplace filtering and CSP. For the Laplace filtering we took the four closest channels in the extended 10-20 setup to compute the Laplace filter for channels C3 and C4 over left and right motor cortices, respectively. For CSP we used 10-fold cross-validation to ensure the filters were never applied to the same data used to generate them. No regularization aside from a small epsilon to ensure positive-definiteness were used for the computation. Two spatial filters were used for each subject corresponding to the two most extreme eigenvalues.

Feature extraction
The neural bands used in this analysis are the standard accepted ranges: 4-8 Hz for θ, 8-13 Hz for α (and μ), and 13-30 Hz for β. In our experience and within the data, the α range of the ALS patients varied (especially for late-stage patients) drastically and so, exclusively for these subjects, all frequency information was calculated using subject-specific bands. In order to ensure the features were compared as fairly as possible, we employed identical preprocessing to generate both AM and FM features within each subject. In all cases the signal was initially bandpassed offline with a 3rd order Butterworth filter, between the low and high values of the various bands of interest. To preserve phase, this was done once forwards and once backwards on the full data from each continuous recording.

Amplitude modulation.
It is well known in the field of BCIs that the bandpower within certain bands of interest is linked to activity within the brain. From Parseval's theorem, we can compute this power by restricting frequencies in the Revised amyotrophic lateral sclerosis functional rating scale [41]. The rating scale was filled out after the recording session by the experimenter.
signal to those of interest and taking the variance of the result. As a result we can compute the bandpower, given a signal x(t), by using a bandpass filter and then performing the following calculation: (1)

Frequency modulation.
To calculate the average frequency location for each trial, we used the analytic signal. The analytic signal is a complex-valued signal which has zero negative frequency components, constructed from a real-valued signal x(t) and its Hilbert transform: This can equivalently be written in its polar form, denotes the amplitude of the signal at time t and φ(t) denotes the phase. In previous BCI applications, both of these have been used as features for a classifier [20,44,45]. However, as with [37], we are not interested in the phase, but rather the instantaneous frequency. This can be computed in general as the derivative of the phase, though for discrete signals we can use the difference operator as follows: This leads, for a time series with n samples, to n − 1 values.
As these values are individually quite noisy, we opt to take the median of them as our FM feature for each trial to best correct for the influence of outliers.

Joint feature space.
For both paradigms and all types of pre-processing, we wanted to compare purely AM and FM features as well as the combination of them. To do this we simply concatenated the two-dimensional feature spaces of each feature type to create a joint four-dimensional feature space. One could also think of this as simply computing two features per channel per trial. As the number of trials is so much higher than the dimensionality of this joint feature space it is very unlikely that the concatenation described here affected results negatively.

Classification
In order to test the usefulness of the FM feature in comparison to the AM feature, we used a simple binary classification tool widely employed in BCIs: linear discriminant analysis (LDA). While more complicated methods could possibly get far better performance, LDA allows us to test how separable the data is in multiple dimensions under the most basic assumptions of homogeneous Gaussianity. In the single-feature cases each trial was represented by a two-dimensional vector and for the joint feature space each trial was represented by a fourdimensional vector. Accuracies were calculated by 10-fold cross-validation within each subject over paradigms and spatial filtering choices. However, for ALS patients it was slightly more complicated as the number of trials varied among subjects. To ensure all subjects could be included, we downsampled to the minimum number of 30. A stratified random split was implemented: 30 trials were chosen randomly with equal classes and a cross-validated accuracy was computed for this subset. This random split was iterated 1000 times and the average of these accuracies was used for each subject.

Statistical analyses
To test statistical significance of differences between conditions (AM, FM, or joint feature spaces) within a paradigm, a permutation test was used. We wanted to test two hypotheses: (1) that subject accuracies using only the FM features are significantly different on average from AM features, and (2) that the joint feature accuracies are significantly higher, on average, than the AM only accuracies. This corresponds to a two-tailed test for hypothesis (1) and a one-tailed test for hypothesis (2), as it should not be possible for the accuracy to decrease with a larger feature space. For both of these we took a pair of values from each subject (AM accuracy/FM accuracy or AM accuracy/joint accuracy) and computed the pairwise t-statistic over the subjects within a paradigm. We then shuffled the pairings 1000 times and computed the t-statistics for these to generate the empirical t-statistic null distribution. The computed p-value is the proportion of t-statistics from the null distribution higher than the t-statistic generated with the true feature set-accuracy pairings. In the case of the two-tailed test, instead of looking for the percentage of the null distribution that was larger we looked for the percentage that was more extreme.

Experiments
In all the different pre-processing and paradigm cases explained above, we conducted the same experiment: within each subject, we computed the accuracy using only AM features, only FM features, and with the joint feature space. We then compared these different methods across subjects within each experiment. In motor imagery the μ and β bands are both well-known for carrying discriminative information, and so we conducted one experiment in each band for the data without pre-processing. Next, we investigated the effects of spatial filtering on these feature spaces. We limited our analysis for these experiments to only the μ band, and processed the data after spatial filtering identically to the analysis without spatial filtering.
For the cognitive paradigm data in the healthy subjects, literature shows the α and θ bands as being predictive for these tasks [27,40,46], and so we did the experiment within the α and θ bands. For the ALS patients, it was impossible to find a θ rhythm for some subjects and the α rhythms varied dramatically, and so we show only results in the subject-specific α. This was identified by recording two 5-minute resting state recordings, one with eyes open and one with eyes closed, and then using the established observation that α power is increased in the eyes closed state [27] to determine the extent of the α band. Figure 1 shows the results of the experiments on the motor imagery data without pre-processing in both frequency bands. While the FM only features do worse than the AM only features in the β band (average 58.0% versus 63.7%, p = 0.02), there was no significant difference in the μ band (64.3% versus 64.0%, p = 0.91). The joint space was significantly better than the AM only features in the μ band (68.7% versus 64.0%, p = 0.01) but not the β band (65.8% versus 63.7%, p = 0.15). Figure 2 shows the results in the μ band for the two spatially pre-processed experiments. The FM only features were less predictive on average than the AM only features, though not significantly so (Laplacian: 65.4% versus 71.3%, p = 0.18, CSP: 68.5% versus 70.2%, p = 0.42). The joint space continued to improve on the AM only results with both a Laplacian filter (74% versus 71.3%, p = 0.04) and CSP filters (73% versus 70%, p < 0.001). Curiously, in half of the subjects Laplacian filtering actually decreased cross-validated accuracies using only FM features (subjects 2,5,6,8,9) at the same time as it increased accuracies using AM alone (80% of subjects). Conversely, using CSP the FM and AM accuracies both increased as compared to without spatial filtering. Figures 3 and 4 both concern the results on the cognitive task. Figure 3 shows results on experiments in two frequency bands for the cognitive paradigm in healthy subjects. In the α band the FM only features are significantly worse than the AM only features (72.7% versus 78.9%, p = 0.01) and the joint space does not out-perform the AM only features (79.4% versus 78.9%, p = 0.39). In the θ band the FM only features and AM only features do not differ significantly (77.1% versus 78.8%, p = 0.47), but the joint space significantly out-performs the AM only features (83.3% versus 78.8%, p = 0.01), showing that this effect is not isolated to the α band. Unfortunately, this result does not carry over to the ALS patients in figure 4, for whom the mean accuracy of the joint space is actually worse (67.4% versus 71.6%). While the joint feature space does not significantly out-perform AM in some cases, notably the motor β and the cognitive α, it still has a higher average accuracy than the AM feature space even in those cases, suggesting that while results are not significantly better there is at the very least no detriment. The only case in which it fails to out-perform is the case of the ALS patients-but given the low number of trials and the heterogeneity of the disease stages shown here, it is still too early to be sure that this feature is not of use.

Results
These results show that the FM feature is surprisingly robust. In order to better understand the magnitude and variance of the feature across subjects, we plotted the two-dimensional feature space to see the separation within subjects for both paradigms. FM features were computed for all trials and de-meaned within subjects to control for different peak locations, then overlayed in order to see how much conditions differed on average. The one-dimensional projections are plotted on the side and top. In motor imagery (figure 5) we find that the frequency features for C3 and C4 are not very correlated across subjects (ρ = 0.28) and that the peak shift is relatively slight compared to the variance within channels. To check the spatial correlation we plot the correlation of the feature in channels C3 and C4 with the other 127 channels, averaged over classes and subjects ( figure 5). The result shows a clear drop-off of correlation with distance on the scalp, which is consistent with the projections of the motor cortex.
In the cognitive paradigm, the frequency scatter plot (figure 6) shows a strong correlation between Fz and Pz (ρ = 0.59), which is interesting given how far apart they are on the head. When the correlations are plotted (figure 6) it can be seen that these two channels are, in general, more spatially correlated than C3 and C4. However, this correlation is more anteroposterior than it is lateral, which is consistent with the projection of fronto-parietal networks [47,48].
In the ALS patients, however, the correlation structure is quite different (figure 7). In comparison with healthy subjects the separation between condition-specific median frequencies is quite pronounced in both channels, and further the channels appear comparatively less correlated (ρ = 0.45). This suggests that despite the lack of significant improvement in decoding accuracy, this feature is still quite robust in ALS patients. When we plot the correlations of the FM feature Laplacian filter for electrodes C3 and C4 was computed using the four closest electrodes to each. The joint space is significantly better than AM (74% versus 71.3%, p = 0.04). For CSP (B), two spatial filters were kept per subject. The joint space is significantly better than AM (73% versus 70%, p < 0.01). In both cases, the FM only accuracies do not differ significantly from the AM only accuracies.   with the other channels, the clear pattern we see in the healthy subjects on average is repeated, although the anteroposterior trend appears to be lessened.

Discussion
We extend the results of [37] with these experiments and show that the location of a neural frequency peak on the frequency axis can be used as a reliable feature for BCI decoding across paradigms and neural frequency bands, and further that it is stable to standard spatial filtering approaches. We found that across both paradigms, and independent of standard spatial filtering approaches, the joint feature space substantially increases the cross-validated accuracy over healthy subjects. Further, we find this effect is not limited to the obvious α peak but generalizes to less visually separable peaks such as the θ.
Especially for studies with few electrodes, this has the potential to markedly improve performance by allowing another useful feature to be extracted without increasing the hardware; furthermore, this opens a new avenue for theoretical study, as processing techniques (whether in terms of temporal or spatial filtering) to find task-related frequency modulation in mixed signals do not, to our knowledge, exist.
Algorithmically, the method we use to compute the average frequency also leads to many questions. Most previous approaches to using instantaneous frequency, or even average frequency, have relied on autoregressive models and computed instantaneous frequency and amplitude from there [32,33]. These require a separate optimization and must be given model orders in advance; we have shown that using a simple transform like the Hilbert is sufficient to get features for classification. However, that is not to say this is the only simple option. The Hilbert transform on a bandpassed signal, and the median we use beyond it, were chosen for their stability to outliers of the instantaneous frequency within a trial. It is very likely that there are more stable approaches to determining the peak frequency yet to be discovered. It is further intriguing that the phase of a mixed signal is not a well-defined concept. What exactly does it mean that taking the Hilbert transform of a linearly mixed signal gives us a property of one of the component signals? How is this related to the frequency range or the number of signals being mixed? Aside from such concerns, it is unclear how higher-level choices, like the location of the frequency band or the length of the mental task periods, affect the results. Standards for both of these were only set using AM features. As a preliminary a posteriori test of the influence of the range of the bandpass filter, we tested many different bandpasses for both features to determine which range performs best on average on the motor imagery data without spatial pre-processing. For both AM and FM features, we varied the frequency band between 6 and 17 Hz in 1 Hz steps for each subject. We then computed the cross-validated accuracy using the same procedure as explained in section 2 and averaged across subjects. The results can be seen in figure 8. As can be seen, the average best frequency band location is quite different for AM and FM features. AM features tend to prefer high bandpasses, while the FM features have their best average performance closer to the standard μ range. This could be due to many factors, but one to consider is that the β range is also predictive in motor imagery and begins, traditionally, at 13 Hz; perhaps, given the imperfections inherent in any discrete time bandpassing procedure, the AM features benefit from this bleeding over of the β range. Next, we tested how different trial lengths affected performance. Using the standard α band and varying the trial length between 1 and 7 s, we compared cross-validated performance on the same dataset, for which the results are shown in figure 9. Both feature spaces do better with longer trials, suggesting that the longer period of brain activity is helpful in finding the true median frequency as well as the true ERD.
It is also important to ask ourselves whether these results may be influenced by artifacts, and not represent features specifically coming from the brain. Looking at figures 5, 6 and 7 shows that the features from the chosen electrodes are almost uncorrelated with the FM features from frontal and peripheral channels that would be dominated by ocular or muscular artifacts, respectively. While such a group level analysis cannot guarantee that certain subjects were influenced by artifacts, it can make us very confident that at the group level, this analysis shows FM features to indeed be related to the brain, and slightly if at all to some sort of artifact.
Next, in a neuroscientific light, these findings are quite curious. The unique information that the FM feature appears to carry suggests that either it comes from the same neural source but has a different noise profile, or that it is generated from a different circuit than the one that generates task-related bandpower differences. Based on our preliminary CSP-based analysis, it appears to be the case that the same circuit generates both, at least some of the time, by optimizing spatial filters for power differences between conditions, we also see an increase in the FM feature predictability. Moreover, we find that optimizing for power differences actually causes the FM to become even more predictive than the AM in some cases. However, the fact that Laplacian filtering can sometimes change the ratio of the predictivity of FM versus AM features suggests that this is not the full story. Perhaps the robustness of FM features is due to the fact that they are generated by the same neural ensembles that generate AM features but also by other ensembles, as while the Laplace filter is well-known for narrowing the sensitivity of the electrode to the locations directly perpendicular to it, CSP filters have more freedom in how they optimize.
A major question for FM features is how robust they are to neurological disease. Patients represent the group most in need of the opportunities BCI offers, and so determining whether this feature is more or less robust for non-neurotypical individuals is a pressing question. The data we have presented does not show that FM is more robust than AM for ALS patients, but it also suggests that more research is required. All accuracies for ALS patients were computed using only 30 trials, half that of the healthy subjects, which makes the accuracy estimation particularly noisy. What we can see, however, is that in some patients the FM feature is quite predictive. Looking at the ALS-FRS-R scores of those patients, this includes one who is completely locked-in and one who is still in very early stages, suggesting that this feature at least needs to be looked into more. Given the amount of variance across healthy participants, attempting to average across both people and disease stages with five participants can give us a preliminary look at best.  Finally, and somewhat crucially, one might ask what the benefit of attempting to find a single feature type that is predictive across many tasks is, as most tasks in BCIs use very unique features to achieve optimum performance. However, this sort of task-specific feature engineering is limiting, in that it only allows for new paradigms to be discovered simultaneous with new feature development. The results shown here, as well as previous results in bandpower-related BCIs, are strong evidence to support the idea that there are common feature spaces across tasks. While it is probable that current optimal performance for a single subject on a single task requires specialized feature computation, the lack of scalability means the search for a generic method of computation is crucial to the development of BCIs in the future, leaving the task of optimizing for individual performance to more intelligent machine learning on these common feature spaces.

Conclusion
Our goal was to test whether FM features are usable for classification in a very simple way, and not to over-optimize. Given that we have shown this to be the case, the space for optimization is enormous. The projection of muscle and eye artifacts into the spectral power of measured signals is well known, but whether they have a similar task-related change in peak location is not yet studied. Perhaps FM features are more robust to artifacts than AM features, which would be a crucial step forward in the quest for more stable ways of reading the brain. Further, the cross-session and online reliability of these features must be shown.