Space-by-time decomposition for single-trial decoding of M/EEG activity

We develop a novel methodology for the single-trial analysis of multichannel time-varying neuroimaging signals. We introduce the space-by-time M/EEG decomposition, based on Non-negative Matrix Factorization (NMF), which describes single-trial M/EEG signals using a set of non-negative spatial and temporal components that are linearly combined with signed scalar activation coefficients. We illustrate the effectiveness of the proposed approach on an EEG dataset recorded during the performance of a visual categorization task. Our method extracts three temporal and two spatial functional components achieving a compact yet full representation of the underlying structure, which validates and summarizes succinctly results from previous studies. Furthermore, we introduce a decoding analysis that allows determining the distinct functional role of each component and relating them to experimental conditions and task parameters. In particular, we demonstrate that the presented stimulus and the task difficulty of each trial can be reliably decoded using specific combinations of components from the identified space-by-time representation. When comparing with a sliding-window linear discriminant algorithm, we show that our approach yields more robust decoding performance across participants. Overall, our findings suggest that the proposed space-by-time decomposition is a meaningful low-dimensional representation that carries the relevant information of single-trial M/EEG signals.


Comparison with non-negative Tucker-2 decomposition
Here we offer a formal comparison between our method and Tucker-2 decomposition. To implement the Tucker-2 decomposition, we used the N-way Matlab toolbox (Andersson and Bro, 2000) and imposed non-negativity constraints to the spatial and temporal components and no constraints to the core tensor.
First, we applied the Tucker-2 decomposition to the data of our example subject and compared the extracted components with the ones identified by our approach. We illustrate the results in Supp. Figure 2A-B. The temporal and spatial components extracted by the two methods have considerable differences. In particular, Tucker-2 merges the first two temporal components into a single one and the other two components overlap highly. A high overlap is also observed for the two spatial components. Instead, scNM3F yields succinct nonoverlapping temporal and spatial components, which as we showed in the paper encode different cognitive functions.
ssNM3F algorithm. These differences could be due to the clustering feature of our method that is not included in Tucker decompositions or the difference in the optimization algorithm (multiplicative update rules for our method versus alternating least squares for Tucker decompositions). To investigate these two alternatives, we also compared our results with another NMF-based algorithm that does not impose clustering constraints. We built this algorithm by extending semi-NMF (Ding et al., 2010) to a 3-factor decomposition and named it sample-based semi-nonnegative matrix tri-factorization (ssNM3F). For tem W and spa W the update rules of semi-sNM3F are: Importantly, ssNM3F is devised in order to be applied to signed data but does not have the clustering feature of scNM3F.
Importance of clustering feature. By applying ssNM3F to the EEG data of the example subject, we found that, similarly to the Tucker decomposition, it identifies highly-overlapping temporal and spatial components (Supp. Figure 2C). This observation suggests that the observed differences in the extracted components are mainly due to the clustering feature of the scNM3F algorithm. We also compared the discrimination performance of the three methods on the same data. We found that for both face versus car classification and phase coherence level classification, scNM3F performed better than Tucker-2 and ssNM3F (Supp. Figure 3).
Importance of optimization algorithm. We then examined whether the use of different optimization algorithms may also affect the decomposition outputs. An important difference in this respect is that the update rules used by the two NMF-based algorithms (Eq.4-5 for scNM3F and Supp. Eq. 1-2 for ssNM3F) make use of both the positive and the negative entries of the input data matrix in order to identify components, whereas the alternating least squares algorithm used in the Tucker decomposition relies on a half-wave rectification of the input data, i.e. it ignores the negative entries.
To investigate how this affects the decomposition outputs, we applied ssNM3F and Tucker-2 to simulated data with known ground-truth components. We generated three temporal components as sums of three Gaussian bursts and two spatial components that were gamma distributed and combined those using normal random coefficients (Supp. Figure 4A). We applied the Tucker-2 decomposition and the ssNM3F algorithm to the simulated data and extracted the spatial and temporal components shown in Supp. Figure 4B-C. ssNM3F extracted temporal and spatial components that were more similar to the original ones than Tucker-2. To quantify this, we computed the mean squared error between the original and extracted modules of the two methods over 100 repetitions of data generation and module extraction. We found that the temporal modules identified by ssNM3F were significantly more similar to the original ones (p<0.001, t-test) than Tucker-2 and also the spatial modules were slightly but not significantly more similar (Supp. Figure 5). This result suggests that, besides the clustering feature, also the use of multiplicative update rules that take into account the negative entries of the data gives a data reconstruction advantage to the space-by-time decomposition when compared to Tucker-2.
Decoding performance comparison. Finally, we compared the decoding performance of scNM3F and non-negative Tucker-2 on the real data of all 10 subjects (Supp. Figure 6). We found that our method performed significantly better than the Tucker-2 decomposition at the 4 population level for both face versus car classification and phase coherence level classification (p<0.01, t-tests).

Supplementary Figures
Supp. Figure 1: Dependence of face versus car classification performance (for the highest phase coherence level) on the number of spatial and temporal components. Classification peaks at 2 spatial, 3 temporal components (indicated by a star) and shows no further increase for larger numbers of components. Hence, we selected this set of components for all further analyses.