Block term decomposition for modelling epileptic seizures

Recordings of neural activity, such as EEG, are an inherent mixture of different ongoing brain processes as well as artefacts and are typically characterised by low signal-to-noise ratio. Moreover, EEG datasets are often inherently multidimensional, comprising information in time, along different channels, subjects, trials, etc. Additional information may be conveyed by expanding the signal into even more dimensions, e.g. incorporating spectral features applying wavelet transform. The underlying sources might show differences in each of these modes. Therefore, tensor-based blind source separation techniques which can extract the sources of interest from such multiway arrays, simultaneously exploiting the signal characteristics in all dimensions, have gained increasing interest. Canonical polyadic decomposition (CPD) has been successfully used to extract epileptic seizure activity from wavelet-transformed EEG data (Bioinformatics 23(13):i10–i18, 2007; NeuroImage 37:844–854, 2007), where each source is described by a rank-1 tensor, i.e. by the combination of one particular temporal, spectral and spatial signature. However, in certain scenarios, where the seizure pattern is nonstationary, such a trilinear signal model is insufficient. Here, we present the application of a recently introduced technique, called block term decomposition (BTD) to separate EEG tensors into rank- (Lr,Lr,1) terms, allowing to model more variability in the data than what would be possible with CPD. In a simulation study, we investigate the robustness of BTD against noise and different choices of model parameters. Furthermore, we show various real EEG recordings where BTD outperforms CPD in capturing complex seizure characteristics.


Introduction
Epilepsy is one of the most common neurological disorders, affecting 0.5% to 1% of the global population [1]. The clinical manifestation of this disease is the epileptic seizure, arising from the abnormal, synchronous electrical activity of a large network of neurons. As such, seizure activity can be recorded using electroencephalogram (EEG), which is currently one of the most important modalities for epilepsy diagnosis and monitoring [2]. However, visual analysis of EEG is often challenging and time consuming, due to several types of artefacts which may be superimposed on the pattern of interest (i.e. ictal activity) and due to the large amount of data resulting from long-term EEG monitoring. Therefore, automatic techniques which are capable of extracting relevant http://asp.eurasipjournals.com/content/2014/1/139 performance and numerical complexity, CCA and CoM2 [8] were shown to be the best choice of BSS method for removing muscle artefacts from epileptic EEG [9]. It was also shown that EEG source localisation is rendered more reliable if eye and muscle artefacts are removed using spatially constrained ICA and BSS-CCA, respectively [10]. An artefact removal scheme from channel × time × frequency EEG tensors using a multiway analysis technique, namely, canonical polyadic decomposition (CPD), was presented in [11].
Removal of artefacts can help the visual interpretation of EEG signals; moreover, a well-estimated ictal source can also provide useful information about the epileptic seizure automatically. The topographic maps corresponding to the ictal component can indicate the lateralisation of the seizure [12]. CPD of channel × time × frequency EEG tensors can provide even more accurate spatial information: it was shown that the topographic map of the ictal source coincides with the clinically determined seizure onset region [11,13]. Subsequently, a dipole can be fitted to the spatial signature of the ictal source to obtain an accurate estimate of the localisation [14]. A recent study has applied CPD on space-time-wave-vector (STWV) tensors [15]. The main advantage of this approach is the fact that it allows distributed source modelling. As such, it allows to estimate the spatial extent of the epileptic source as well, which is crucial in case epilepsy surgery is needed.
Furthermore, subsequent seizures can be automatically detected based on the topographic maps extracted by ICA or the spatial-spectral profile extracted by CPD from a reference ictal pattern. New seizure segments can be recognised either by subspace correlation [16] or inferring from the temporal signature while fixing the spatial and spectral modes in CPD [17]. PCA and ICA have also been applied as a feature extraction method for support vector machine classification of ictal versus normal EEG segments [18,19]. The success of these approaches strongly depends on the reliability of the blind source separation.
CPD decomposes the channel × time × frequency EEG tensor into a sum of rank-1 tensors. As such, each extracted component is defined by the combination of exactly one spatial, temporal and spectral signature. CPD is a trilinear model, i.e. the vectors along each mode are proportional to each other. For example, the spectral signature is linearly scaled over the time and channel modes, where the weights of the scaling are given by the values of the temporal and spatial signatures. Similarly, the temporal and spatial signatures are linearly scaled over the other two modes. Hence, the CPD model assumes that the source maintains the same spectral structure and topography within the observed window. However, focal epileptic seizures are typically characterised by evolving repetitive sharp waves. The evolution can occur in frequency, amplitude, morphology and topography [20]. Decomposition methods which allow more variability and more interaction between the factors are needed in order to capture such nonstationarities.
Here, we describe the first biomedical application of block term decomposition (BTD) [21,22], a generalisation of CPD allowing decomposition in terms which are of higher multilinear rank. As such, depending on the mode−n rank of a certain component, BTD facilitates modelling two or more distinct underlying patterns present along mode−n. We decompose wavelettransformed EEG tensors into rank-(L r , L r , 1) terms to extract the epileptic source from ictal EEG recordings. Such decomposition facilitates the extraction of sources with a fixed spectral structure which spatially spread over time or sources which evolve in frequency but retain a fixed localisation.
Alternatively, EEG signals can be modelled as a sum of exponentially damped sinusoids [23][24][25]. Mapping the signal observations to Hankel matrices allows the retrieval of the poles generating the system by singular value decomposition [26]. Furthermore, such representation leads to a new, deterministic blind source separation technique. More specifically, a mixture of R signals, each generated by L r poles, can be uniquely decomposed into rank-(L r , L r , 1) terms [27]. Therefore, we will also apply BTD-(L r , L r , 1) on EEG tensors, where the slices along the spatial mode are Hankel matrices corresponding to the observations from each EEG channel.
In a simulation study, we investigate the robustness of the tensor decomposition techniques against physiological noise including background EEG activity and muscle artefacts, the impact of the chosen model parameters, and the advantages and differences of each approach. Finally, we compare the performance of BTD and CPD on various real ictal EEG signals recorded from different patients.

Notation and definitions
Vectors are denoted by boldface lower case letters, e.g. a.
Matrices are denoted by boldface capital letters, e.g. A, while tensors are denoted by calligraphic letters, e.g. A. An entry of a vector a, a matrix A, or a tensor A is denoted by a i , a i,j , a i,j,k , etc., depending on the number of modes. Mode−n vectors are the generalisation of matrix rows and columns to tensors. A mode−n vector is a vector in which all but one of the indices are fixed. The Kronecker product of two matrices A and B is denoted by A B. Definition 1. The mode-n product of a tensor A ∈ K I 1 ×I 2 ×···×I N with a matrix U ∈ K J×I n is denoted as A× n U http://asp.eurasipjournals.com/content/2014/1/139 and is of size I 1 × · · · × I n−1 × J × I n+1 × · · · × I N . The entries of the mode-n product are defined as (1) Definition 2. The outer product A • B of a tensor A ∈ K I 1 ×···×I M and a tensor B ∈ K J 1 ×···×J N is the tensor defined by for all different values of the indices.

Definition 3.
The Khatri-Rao product of two matrices A ∈ K I×K and B ∈ K J×K is defined as

Definition 4.
The k-rank of a matrix A, denoted as k A , is defined as the maximum value k such that any k columns of A are linearly independent. Definition 5. The mode−n matricisation A (n) of an Nth-order tensor A ∈ K I 1 ×I 2 ...I N maps the tensor element with indices (i 1 , . . . , i N ) to a matrix element (i n , j) such that

Tensor decompositions 2.2.1 Canonical polyadic decomposition
CPD approximates a third-order tensor T ∈ K I 1 ×I 2 ×I 3 with a sum of R rank-1 tensors: CPD is visualised in Figure 1. Note that the definition is formulated for third-order tensors; however, the model can be extended to higher-order tensors in a straightforward manner. The rank of the tensor is defined as the smallest R for which (5) be the factor matrices corresponding to each mode. Then, CPD can be alternatively written as The advantage of the CPD model is its uniqueness up to permutation and scaling under mild conditions [28]: A more general framework for uniqueness has been recently presented in [29,30].

Block term decomposition
The rank-(L r , L r , 1) block term decomposition [21,22] of a third-order tensor T ∈ K I 1 ×I 2 ×I 3 into a sum of rank-(L r , L r , 1) terms (1 ≤ r ≤ R) is given as in which the matrix D r = A r · B T r ∈ K I 1 ×I 2 has rank L r and the vector c r is nonzero. In addition to permutation and scaling, inherited from the CPD, the factors A r may be postmultiplied by any nonsingular matrix F r ∈ K L r ×L r , provided that B T r is premultiplied by the inverse of F r . When the matrices [A 1 . . . A R ] and [B 1 . . . B R ] are full column rank and the matrix [c 1 . . . c R ] does not contain collinear columns, the decomposition is guaranteed to be unique up to the above indeterminacies. Figure 2 visualises the decomposition of a tensor in rank-(L r , L r , 1) terms. Note that BTD-(L r , L r , 1) is a generalisation of CPD for third-order tensors.

Algorithms
Different types of algorithms have been derived and discussed in the literature for tensor decompositions. The Alternating Least Squares (ALS) algorithm [31] was proposed for calculating CPD by updating the factor matrices in an alternating manner. Other computational schemes, such as Nonlinear Least Squares (NLS) [32], offer better robustness for difficult decompositions (notably, when the terms in the decomposition are somewhat collinear) and can improve the linear convergence rate of ALS to a quadratic rate. Each NLS step can be interpreted as starting from an ALS update that updates all factor matrices simultaneously, which is then iteratively refined with a preconditioned conjugate gradient algorithm so that it approximates the Newton step. Here, we used the NLS implementation of CPD and BTD-(L r , L r , 1) available in Tensorlab [33].

Tensor construction
Multichannel EEG data naturally take the form of a matrix A ∈ R S×Ch , where S and Ch correspond to the number of samples and channels, respectively. Below, we present two different approaches to extend this to a tensorial representation by expanding the time course into an extra dimension, with the aim of conveying additional information about the signal. http://asp.eurasipjournals.com/content/2014/1/139 Figure 1 CPD of a tensor T in R rank-1 terms.

Wavelet expansion
As the frequency content of EEG signals carries crucial information, wavelet transformation is often used to expand the EEG matrix into a tensor A ∈ R S×Ch×F , where F is the number of wavelet scales or frequencies [11,13,34,35]. Before wavelet transformation, the EEG data is normalised by subtracting the mean and dividing each channel signal by its standard deviation. Note that after decomposition, the scalp potentials are multiplied again with this standard deviation in order to preserve topographic information. Continuous wavelet transform (CWT) was performed using the Mexican hat wavelet of 30 scales, corresponding to a linear range of frequencies between 1 to 30 Hz. After tensor decomposition, the different modes describe the spatial, spectral and temporal signatures of the components. The source signals can be reconstructed by an inverse CWT (ICWT) of the retrieved time-frequency planes. We will refer to a BTD decomposition performed on tensors obtained by wavelet expansion as CWT-BTD.

Hankel expansion
EEG signals can be modelled as the sum of exponentially damped sinusoids [23][24][25]. Such signal model allows unique blind source separation in rank-(L r , L r , 1) terms. A detailed proof of this concept is presented in [27]. Below, we give a brief overview of the main considerations. We assume that the underlying EEG sources can be expressed as the sum of exponentials: This model also subsumes that the sources might be exponentially damped sinusoids: To exploit the desired structure, each EEG channel sig- Since this mapping is linear and assuming that the channel signals are linear combinations of the underlying sources, the above matrix is the linear combination of the Hankel matrices associated with the sources. If the source s r (n) can be written as (9), its associated Hankel matrix H r admits the Vandermonde decomposition: where V r ∈ K I×L r andV r ∈ K J×L r are, respectively, Assuming that I, J ≥ max(L 1 , L 2 , . . . , L R ), and considering the fact that a Vandermonde matrix generated by distinct poles is full rank, H r is rank-L r . Therefore, (8) solves the blind source separation problem if the underlying sources follow the structure described in (9).
For example, the Hankel matrix of a pure exponential is rank 1, while the one of a pure sinusoids or an exponentially damped sinusoid is rank−2. Noisy or nonstationary signals such as chirps give rise to Hankel matrices of higher rank. Before creating the Hankel matrices, the EEG channel signals are divided by their standard deviation. Note that the mean is not subtracted here as this could introduce an additional pole. There are two ways to interpret the sources retrieved by this decomposition. First, one can reconstruct the source time course by taking the mean along the anti-diagonals of the retrieved matrix. Alternatively, one can retrieve the poles generating each source using the reconstructed Hankel matrices. The consecutive algorithmic steps of retrieving the signal poles from the Hankel matrices are given, e.g. in [36]. However, in this paper, we restricted ourselves to the first method. We will refer to a BTD decomposition performed on tensors obtained by Hankel expansion as H-BTD.

Model selection
Certain model parameters have to be determined prior to performing blind source separation. The number of extracted components or terms R have to be chosen for both CPD and BTD. Additionally, the rank of each mode needs to be set for BTD. In case of BTD-(L r , L r , 1), this means to determine which mode should be rank-1 and choose the rank L r for the two other modes. If not stated otherwise, we set L 1 = L 2 = . . . = L R .
Several procedures have been proposed for automatic model selection in tensor decompositions. For CPD type models, the core consistency diagnostic [37] seems to be the most powerful approach [38] and has been successfully used to guide the blind source separation of epilepsy tensors [11,13].
The core consistency diagnostic is based on the following principle. The CPD model can be formulated as a restricted Tucker model where the core tensor has nonzero values only on its superdiagonal. Considering the Tucker model as a regression of a tensor onto subspaces defined by the factor matrices, it is clear that a CPD model is appropriate, if the least squares fitted core tensor on the CPD factors has off-diagonal elements close to zero. The optimal number of CPD components is the last one in a series of models with increasing number of components, where the least squares fitted core tensor is still similar with the ideal Tucker core tensor.
The parameter selection for more flexible tensor models such as BTD-(L r , L r , 1) is the topic of still ongoing research (see Section 4 for an overview) and is out of the scope of this paper. Our aim is rather to give an insight to the sensitivity of CPD and BTD to the different parameters and to illustrate what can be achieved with well-chosen model parameters.
Therefore, we simulated various ictal activity patterns superimposed on artefacts and background activity. The signals were subsequently decomposed with CPD and BTD using a wide range of values for each model parameter in order to investigate the impact of the chosen model parameters.

Simulation study
EEG activity of 2-s length was simulated in different scenarios following [14]. The forward problem was solved for each scenario in a three-shell spherical head model consisting of a brain, a skull and a scalp compartment [39]. The ratio between the conductivities of the brain, skull and scalp compartment was equal to 1 : 1/16 : 1, respectively [40], where the conductivity of the brain and scalp was 3.3 · 10 −4 /mm [41]. The radii of the outer boundary of the brain, skull and scalp compartments were set to 8, 8.5 and 9.2 cm, respectively. The forward solution was computed for 21 electrodes placed according to the 10/20 system with two http://asp.eurasipjournals.com/content/2014/1/139 additional electrodes over the temporal region. The time course of the scalp potentials was stored in a 500 × 21 dimensional matrix A, representing 2 s of EEG with sample frequency of 250 Hz. Awake background EEG activity was recorded with the same electrode configuration from a healthy subject. Muscle artefacts were separated from a contaminated segment of background activity using BSS-CCA [7]. Subsequently, the muscle artefacts were superimposed on a clean background EEG segment, and the data was stored in a noise matrix B. In the simulation study, the noise matrix B was superimposed on the signal matrix A containing the ictal activity: X(λ) = A + λ · B with λ ∈ R. We varied the parameter λ resulting in various signal-to-noise ratio (SNR) levels, quantified as where the root mean square value (RMS) of a signal matrix M ∈ K Ch×S consisting of Ch channels and S samples, is defined as The noisy ictal EEG segments were expanded with the wavelet or Hankel method and were subsequently decomposed with CPD and BTD in order to extract the ictal component. Note that CPD was not applied on tensors obtained with Hankel expansion, as the Hankel matrix of a sinusoidal or chirp signal is always different from rank-1. The component corresponding to the ictal source was selected automatically as the one showing the lowest root mean square error (RMSE) in spatial distribution with the simulated ictal source. Subsequently, one dipole was fitted on the extracted ictal source signal to compute the localisation error. The goal of the simulation study was to assess the robustness of each method against noise. Furthermore, as explained above, it also serves to investigate the impact of different choices of model parameters and ultimately to determine the optimal model parameters.

Clinical examples
Ictal EEG recordings were selected from the database used in [13,42]. The original database consisted of 37 refractory partial epilepsy patients who underwent full presurgical evaluation including seizure semiology, structural MRI, interictal EEG, subtraction of ictal SPECT coregistered with MRI (SISCOM) and neuropsychological assessment. A patient was included in the database if all measurements were concordant and reliably defined the epileptogenic zone. In a majority of cases, the seizure onsets were correctly localised using CPD of wavelet-transformed EEG tensors [13]. In these cases, the trilinear signal model assumed by CPD is sufficient; therefore, we do not expect an improvement using BTD. However, in cases where no perfect separation was obtained by CPD due to severe artefacts, BTD might provide improved results. Although [13] focussed on localising the seizure onset zone, one might be interested in modelling other aspects of the seizures, such as its evolution in morphology or topography. As opposed to CPD, BTD can model such nonstationary sources. Here, we will discuss the following patients, each representing a particular case (severe artefacts or presence of nonstationarities), where we expect that BTD can provide more appropriate signal models than CPD.

Patient 1
Patient 1 suffers from right temporal lobe epilepsy. The seizure consists of 5-to 6-Hz activity lateralised to the right, most prominently present over the right anterior and midtemporal region (F8, T4 and right sphenoidal channels). Severe eye blinks and muscle artefacts are superimposed on the low-voltage ictal activity at onset (Figure 3a). Our aim here is to separate the seizure activity from the artefacts and background using a 2-s EEG segment at onset and thereby localise the seizure onset zone as in [13]. The window length of 2 s was chosen considering that the number of samples provide sufficient amount of information about the signal, but it is short enough to assume that the seizure does not spread yet from the onset region. As we are interested in the exact onset localisation of the seizure, the spatial mode of BTD is chosen to be rank-1, while the frequency and temporal modes are higher rank.

Patient 2
Patient 2 suffers from left temporal lobe epilepsy. The seizure starts with a 4-Hz delta rhythm which is most prominent over the left anterior and midtemporal region (F7, T3 and left sphenoidal channel). Eleven seconds after onset, the seizure pattern evolves in amplitude and frequency into a sharp, up to 8-Hz theta activity. Our aim here is to correctly model the frequency evolution of the seizure. Therefore, the frequency and temporal modes of the BTD is chosen to be higher rank while we assume a stationary localisation, i.e. rank-1 spatial mode. As the transition takes place over a longer period of time, here, we use a 10-s long EEG segment, shown in Figure 4a.

Patient 3
Patient 3 suffers from right temporal lobe epilepsy. The seizure starts with a high amplitude 4-Hz delta activity over the right anterior, mid-and posterior temporal regions (F8, T4, T6 and right sphenoidal channels). After 14 s, the seizure activity spreads to the bi-fronto-central region. Our aim here is to correctly model the spatial spread of the seizure using a 10-s EEG segment shown in Figure 5a. Therefore, the spatial and temporal mode of http://asp.eurasipjournals.com/content/2014/1/139 the BTD is chosen to be higher rank, and we assume a stationary frequency, i.e. rank-1 frequency mode.

Evaluation criteria
The goodness of the model fit is evaluated in terms of several measures. For scenarios i and ii, where the ictal pattern had fixed topography, the RMSE between the spatial distribution of the simulated ictal pattern and the spatial signature of the extracted ictal source was computed. Moreover, the RMSE between the time×frequency matrices, computed as the product of the mode-2 and mode-3 factors, was also taken into account. Similarly, RMSE between the Hankel matrices of the simulated and extracted ictal sources was assessed as well. Finally, the RMSE between the simulated and reconstructed EEG time courses was investigated. The source time courses were reconstructed using inverse wavelet transform from the time × frequency matrices in case of CWT-BTW and by averaging along the antidiagonals of the Hankel matrices in case of H-BTD. http://asp.eurasipjournals.com/content/2014/1/139 and a rhythmic oscillatory temporal pattern with increasing frequency. However, these peculiar frequency characteristics can not be directly seen on the frequency signature, which shows a single peak at 6 Hz. (c) CWT-BTD of the seizure. BTD captures the seizure source in the first block term; the second block term is now shown. Note the close resemblance between S1 of BTD and S1 of CPD. Moreover, T1a captures the late fast, while T1b captures the early slow oscillatory pattern of the seizure. The frequency characteristics can be directly seen from the frequency signatures, namely, the 8-Hz peak in F1a and the 4-Hz peak in F1b. CWT-BTD of the seizure. The first block term captures the same seizure source (compare S1a and T1a with S1 and T1), however, also captures a source with the same frequency characteristics located frontally. While T1b increases in amplitude after 900 samples, T1a decreases in amplitude after 700 samples. This can be interpreted as the seizure spreading from the temporal to the frontal region, in accordance with the visual assessment of the ictal EEG pattern.
For scenario iii, where the ictal pattern has varying topography, the spatial and temporal signatures cannot be interpreted independently. Therefore, the EEG was reconstructed from the ictal sources and dipoles were fitted on the reconstructed data. The goodness of the decomposition was evaluated in terms of the dipole localisation error.
In the clinical examples, the true underlying ictal source is quantitatively not known. The clinical description of the ictal patterns contains information on the channels where the seizure onset is observed, with additional information on the frequency and the morphology of the seizure pattern. Therefore, the extracted ictal sources are visually inspected and compared to the written qualitative clinical description.

Scenario i and ii
CPD successfully extracted the single epileptic source from a channel × time × frequency tensor in case of a stationary ictal source or an ictal source with evolving frequency. Figure 6 shows the RMSE in time course and in spatial distribution between the simulated and reconstructed ictal source. The ictal source is captured already in the first CPD component for an SNR > 0.4, and the reconstruction does not benefit from extracting additional components. However, for SNR < 0.4 the ictal signal is covered in noise; therefore, two or more components are required. In such cases, one CPD component captures the ictal source and the others serve to http://asp.eurasipjournals.com/content/2014/1/139 remove artefacts and model background activity. Note that if the number of components is set too high (R > 4) in scenario ii, the nonstationary ictal source is split into two components, compromising the reconstruction of the time course. These observations are in accordance with results obtained with the core consistency diagnostic, suggesting two, three or perhaps four stable components. Figure 7 shows the performance of BTD on tensors obtained with wavelet expansion. The results are very sensitive to the chosen number of block terms R both in case of a sinusoidal or a chirp-like ictal source. The best reconstruction can be achieved with R = 2 in both cases. Note that a stationary ictal source has rank-1 structure; therefore, BTD is an inherently suboptimal model. Still, one term will resemble the ictal source, where the various signatures constituting the rank-L r term are the superposition of the true ictal pattern and noise, as depicted in Figure 8. However, the exact choice of L r > 1 does not seem to have a large influence on the RMSE between the reconstructed and true ictal source.
In case of an ictal source with evolving frequency, L r = 2 gives the best reconstruction, although the performance is compromised for very low SNRs. On the one hand, the ictal pattern can be captured for very low SNRs if L r is set higher. On the other hand, setting L r too high has similar effects as the BTD model of a sinusoid source: artefacts are superimposed on the ictal signal even for high SNR values, hindering interpretation. Figure 9 shows the performance of BTD on tensors obtained with Hankel expansion. Regardless of the number of extracted block terms, H-BTD can robustly reconstruct the spatial map corresponding to the ictal source both in case of a sinusoidal or a chirp-like time course. Similarly, a chirp-like ictal source is well localised given an arbitrary choice for the rank of the factor matrices. However, this is not the case for the sinusoidal ictal source. Moreover, the choice of the rank has a strong influence on the reconstruction of the ictal time course. While the sinusoidal time course is best reconstructed with L r =2 in accordance with theory, the reconstruction of the chirplike source requires L r = 6. Note that the rank of a chirp signal depends on how nonstationary it is. With the above choices for L r , the time course of the ictal source is best reconstructed with R = 3 according to our simulation results.
For a direct comparison between the BSS methods, the model parameters which gave the most robust result were http://asp.eurasipjournals.com/content/2014/1/139  The performance of all BSS methods are compared in Figure 10.
H-BTD outperformed CPD and CWT-BTD in reconstructing the time course of both the stationary and the evolving ictal sources. Regarding the retrieval of the spatial maps, all three BSS approaches performed equally well, reaching an RMSE in spatial distribution below 0.6 with the simulated ictal source, which corresponds to a dipole localisation error of less than 5 mm. However, CWT-BTD was not robust against very low SNRs. As already stated, BTD is an inherently suboptimal model for a sinusoidal source, which is also reflected by its lower performance in reconstructing the time×frequency matrices and the ictal time course. In case of an ictal pattern with evolving frequency, CWT-BTD achieves a lower RMSE with the true time × frequency representation compared to CPD. While the frequency signature of CPD shows a single peak at 6 Hz, i.e. at the average of the start and end frequency, the frequency signature vectors obtained with BTD-(2,2,1) represent a spectrum peaking at 7 Hz and another peaking at 4 Hz. From the corresponding temporal BTD signatures, one can deduce that the ictal pattern is slowing down; the latter gains amplitude towards the end. Although they provide a sufficiently clear interpretation, note that due to the indeterminacy of the factors (see Section 2.2.2), the signatures T1a and T1b as well as F1a and F1b can be any linear combinations of the true temporal and frequency characteristics of the underlying source. However, the time × frequency matrix is unique and can also be used to observe the spectral-temporal properties of the source. An example where SNR = 0.9 was chosen is shown in Figure 11. Interestingly, after the inverse wavelet transform of the time × frequency matrices, the reconstructed time course of the BTD ictal term shows higher RMSE with the true ictal pattern than the CPD component does.
So far, the wavelet-transformed EEG tensors were modelled with L 1 = L 2 = . . . = L R = 1 using CPD and L 1 = L 2 = . . . = L R = 2 using BTD. However, BTD allows different choices for each L r . Considering that R = 2 gave a robust solution against noise in each case, we also tested an intermediate solution, namely, using L 1 = 2 and L 2 = 1. For low noise levels, the ictal source was captured in the rank-2 term. In contrast, if SNR < 0.6, the high power noise requires a higher complexity representation and occupies the rank-2 term, while the seizure pattern is modelled in a rank-1 term, providing a similar ictal component as CPD.

Scenario iii
The performance of CPD and CWT-BTD was evaluated for this scenario. Our goal here is to capture a moving ictal source; therefore, we are looking for a single source with a spatial and temporal signature of higher rank and with a frequency signature of rank 1. H-BTD is not tested in this scenario, considering that using Hankel representation mode-2 and mode-3 are both different from rank-1; therefore, a source which also has higher rank spatial signature cannot be modelled in rank-(L r , L r , 1) terms. In a similar assessment as above, varying the SNR and the model parameters, we observed that the best reconstruction of the ictal source in terms of spatial distribution was achieved with R(CPD) = 3 (confirmed by the core consistency diagnostic as well), R(CWT − BTD) = 2 and L r (CWT − BTD) = 2. The decomposition was performed using these parameters. Then, the multichannel EEG corresponding to each component was reconstructed. Subsequently, dipoles were fitted to the reconstructed components showing the lowest RMSE with the simulated ictal source. In case of BTD, one component corresponded to the ictal source, while the second component was an artefact. As the reconstructed EEG of a BTD component is rank−2, two dipoles were fitted on the reconstructed signal. In case of CPD, both the topography and the time course of one component resembled the simulated ictal source. The topography of the second component was also similar to that of the simulated time course. The third component corresponded to an artefact. The reconstructed EEG of a CPD component is rank-1; therefore, one dipole was fitted to the first component and one to the second component. Alternatively, one can take the sum of these two reconstructed EEGs and two dipoles can be fitted on the resulting signal.
The localisation error of the extracted sources with respect to the corresponding simulated source is shown on Figure 12. Figure 12a compares the results of BTD with CPD, when one dipole was fitted to each CPD component separately. However, results obtained from the second CPD component are omitted in this figure, as the corresponding localisation error exceeded 10 cm. The dipole estimated based on a single CPD component is located in between the simulated dipole sources. However, the two dipoles estimated based on the BTD component are located close (less then 1 and 2 cm) to the simulated dipole sources. The positions of the simulated and the extracted ictal sources are shown in Figure 12c for SNR = 1. Figure 12b compares the results of BTD with CPD, when two dipoles were fitted to the sum of the reconstructed EEGs of the first two CPD components. In general, the dipoles fitted on the BTD component are closer to the simulated sources than the ones estimated based on CPD. In cases where one CPD-based dipole is slightly better localised than the BTD-based one, the localisation error of the second dipole is much worse. http://asp.eurasipjournals.com/content/2014/1/139 Figure 11 Scenario ii: simulated ictal source with evolving frequency at SNR = 0.9. (a) CPD decomposition. The frequency signature (F1) of the first component, corresponding to the ictal source, shows a single peak at 6 Hz, i.e. at the average of the start and end frequency. (b) BTD decomposition. The spatial mode of the BTD components were set to be rank-1, while the frequency and temporal modes were set to rank-2. Therefore, this block component comprises the spatial signature S1, the frequency signatures F1a and F1b and the temporal signatures T1a and T1b. The frequency signature F1a and F1b, corresponding to the ictal source, represent a spectrum peaking at 4 and 7 Hz, respectively. From the corresponding temporal signatures, one can deduce that the ictal pattern is slowing down, as T1a gains amplitude towards the end. In summary, the dipole localisation based on BTD was more reliable than that based on CPD.

Clinical examples
The optimal number of CPD components was estimated with the core consistency diagnostic. Additionally, the results of the simulation study were also considered in the model selection for both CPD and BTD. In all the examples below, the following parameter settings were chosen: Figure 3 shows the results of the CPD and BTD decompositions of the 2-s EEG segment at the onset of the seizure of patient 1. CPD failed to extract an epileptic source where the spatial signature matches the seizure onset zone. The spatial signature of both components shows a distribution typical for eye movement-related artefacts. Interestingly, BTD comprises both these components in one block term, term 2. Note the similarity between the spatial signatures S1 of CPD and S2 of BTD, and the correspondence of F1 and T1 with F2b and T2b, as well as of F2 and T2 with F2a and T2a. The seizure activity is successfully modelled in the first block term. The spatial signature corresponds well with the seizure onset zone as assessed during the presurgical evaluation. Moreover, the frequency signature F1b indicates the dominant frequency of the seizure pattern (5 Hz), and the temporal signature T1b reflects the semi-rhythmic time course of the ictal pattern. The ictal pattern was successfully extracted by H-BTD as well. The spatial signature of the retrieved CWT-BTD and H-BTD ictal term resembles each other closely. http://asp.eurasipjournals.com/content/2014/1/139

Patient 2
The CPD and BTD decompositions of the seizure of patient 2 are depicted in Figure 4. The first CPD component corresponds to the ictal source, with clear left temporal localisation and a rhythmic oscillatory temporal pattern with increasing frequency. However, these peculiar frequency characteristics cannot be directly seen on the frequency signature, which shows a single peak at 6 Hz. BTD captured the seizure source in the first block term. Note the close resemblance between S1 of BTD and S1 of CPD. Moreover, the temporal signatures T1a captures the late fast one, while T1b captures the early slow oscillatory pattern of the seizure. The frequency characteristics can be directly seen from the frequency signatures, namely, the 8-Hz peak in F1a and the 4-Hz peak in F1b. H-BTD also extracted the ictal source, successfully capturing both the localisation and the temporal pattern of the seizure.

Patient 3
The CPD and BTD decompositions of the seizure of patient 3 are depicted in Figure 5b,c, respectively. The first CPD component corresponds to seizure activity, showing a clear right temporal localisation and a 4-Hz oscillatory pattern. The second CPD component resembles an eye blink artefact, as indicated by the frontal maximum of the http://asp.eurasipjournals.com/content/2014/1/139 spatial pattern S2 and the two eye blink patterns between 500 and 1,000 samples on the temporal signature T2. The first block term captures a similar ictal source as CPD (compare S1 with S1a and T1 with T1a), however, also captures a source with the same frequency characteristics located frontally. While T1b increases in amplitude after 900 samples, T1a decreases in amplitude after 700 samples. This can be interpreted as the seizure spreading from the temporal to the frontal region, in accordance with the visual assessment of the ictal EEG pattern. Moreover, the eye blink artefact is captured in the second BTD component (see T2a and S2a). Therefore, one can be confident that the frontal maximum observed on S1b is due to seizure propagation and not due to an eye artefact. In contrast, the changing localisation of the seizure source was not captured with CPD.

Discussion
Block term decomposition is a recently introduced tensor decomposition technique which has also been proposed as a blind source separation technique for exponential polynomials. In this paper, we present its first biomedical application a novel way of modelling epileptic seizure activity. We partly rely on the signal model presented in [27] and assume that the sources are the linear combinations of exponentially damped sinusoids. This signal model is conveyed by constructing a Hankel matrix from each channel time course. In addition, we present an alternative approach where the multichannel signal is expanded by a wavelet transform. The method can be seen as an extension of the canonical decomposition of EEG tensors, a method which has been successfully used to localise the seizure onset [11,13]. In the majority of cases, a short EEG segment will be stationary in its spatial, spectral and temporal characteristics; therefore, CPD will be successful in extracting the source of interest. However, the extension of CPD to BTD is necessary when this assumption is violated. We showed three related examples, a seizure severely contaminated with eye artefacts, one with evolving frequency and finally another one which spreads to distant brain regions. We demonstrated that while CPD failed to model these seizures correctly, BTD could extract an ictal source which corresponded well with the clinical assessment.
However, the success of this method largely depends on the appropriate selection of model parameters, namely, the number of extracted components and the rank of the factor matrices. In a simulation study, we investigated the impact of the model parameters given different underlying ictal patterns and different noise levels. We found that H-BTD was very robust against noise and against the number of extracted components. It outperformed CPD and CWT-BTD in reconstructing the time course of the ictal pattern. However, depending on the waveform of the ictal source, different rank settings were necessary. In contrast, the best model parameters for CWT-BTD were identical regardless of the underlying ictal pattern. Although CWT-BTD was less robust than CPD or H-BTD against a SNR below 0.4, one can expect lower noise levels in real EEG recordings.
The optimal choice for L r and R found in this particular simulation study can not be generalised; in fact, it strongly depends on the characteristics of both the ictal source and the artefacts. Nevertheless, considering that low rank models are realistic in this application and the chosen tensor representations, a rapid visual evaluation for setting the model parameters is feasible. Several procedures have been proposed for automatic model selection of tensor decompositions. Some are specific to CPD, while others were proposed for a general Tucker3 model. Note that the latter are also applicable to CPD and BTD-(L r , L r , 1) type decompositions, i.e. restricted Tucker3 models. A commonly followed approach consists in estimating the optimal model parameters based on the decrement of the Tucker3 model error while increasing the complexity of the model, as performed in DIFFIT [43,44]. However, certain artefacts, such as muscle-related activity, might account for a large amount of variability in the data but can not be modelled with a low-rank component. Consequently, such methods overestimate the rank or the number of components in the current application. In fact, we are not interested in modelling artefacts or noise, but in correctly modelling the source of interest. In view of this, an interesting tool was implemented in Tensorlab [33], which chooses the number of rank-1 terms for a CPD decomposition given a desired model fit as the corner of the L-curve [45] representing the trade-off between fit and the order of the model. Note that this approach requires a reliable estimate of the noise level. Subsequently, model selection for a BTD-(L r , L r , 1) type decomposition can be performed by clustering the rank-1 terms to form R rank-(L r , L r , 1) terms, where R is estimated as the number of significant singular values of the rank-1 factor matrix, or based on the gap statistic [46]. Alternatively, a Bayesian framework for model selection based on automatic relevance determination was proposed and proven to outperform DIFFIT [38]. Nevertheless, in the same study, the core consistency diagnostic [37] was shown to be the most robust technique for estimating CPD models [38]. The core consistency diagnostic has been successfully applied for CPD model estimation for decomposing epilepsy tensors [11,13]. Although the core consistency diagnostic has recently been extended for testing the validity of hypothesised restricted Tucker3 models [47], it has not been applied or evaluated yet as a systematic tool for model parameter estimation. Future work will focus on experimental validation and comparison of the http://asp.eurasipjournals.com/content/2014/1/139 different model selection approaches for BTD-(L r , L r , 1) decompositions in a realistic simulation study and on real EEG data.
Moreover, in order to apply this technique in a real clinical environment, the seizure source has to be automatically identified amongst all extracted components. A possible approach, proposed in [13], orders the components extracted by CPD based on the variance they explain. This way, the ictal source was ranked first provided that no eye artefacts were present in the data. Nevertheless, components corresponding to eye artefacts can be easily recognised and rejected. A follow-up study should be conducted to validate this approach on a larger clinical dataset using the CWT-BTD and H-BTD approaches as well.
In this paper, we showed various examples where the BTD provides a better model of an epileptic seizure than CPD. These more accurate models can lead to better source localisation as well as to better feature extraction and, consequently, more successful seizure detection. Furthermore, a wide range of neuroscience applications could benefit from BTD. CPD, as an exploratory tool for wavelet-transformed event-related potential (ERP) data successfully revealed condition-specific differences in a visual task paradigm [34]. However, the evoked response of some subjects deviated too much from the average activity; therefore, they did not contribute to the CPD source representing the condition-specific effect. In other words, the assumption behind CPD analysis that the same stimuli elicit the same response in all subjects was violated, leading to a model which did account for the subject specific differences. A BTD-type decomposition can offer a more flexible model, and we anticipate that it might not only yield more accurate results but allow more insight into subject-specific brain dynamics. CPD has also been shown to be a promising tool in brain computer interfaces. Wavelet-transformed single trial ERPs in response to different stimuli were distinguished by CPD in a motor imagery task [48] and in case of visual evoked potentials [49]. Recently, the trial mode in the CPD decomposition of channel × time × trials tensor was used to accurately classify different stimulus types in a visual detection task [50]. However, single-trial ERPs in response to the same stimulus show large variability compared to grand average ERPs due to physiological modulation and to low signal-to-noise ratio. Considering that BTD is able to capture more variability in one term, we believe that it could improve single-trial ERP classification as well.

Conclusions
The morphology and spatial distribution of the seizure pattern provide important information about the localisation, configuration and the size of the epileptogenic zone. As the ictal EEG pattern is superimposed on background EEG activity and is often obscured by artefacts, BSS techniques are very useful to extract the clean activity pattern. In this paper, ictal EEG patterns were decomposed using a recently introduced tensor decomposition technique, called block term decomposition. We applied wavelet transformation or Hankel expansion to organise the EEG data in a tensor. The former approach was capable of modelling nonstationary seizures which evolve either in frequency or spatial distribution, while the latter was useful for modelling the ictal pattern as a sum of exponentially damped sinusoids. Both approaches successfully extracted the seizure source in the presence of severe artefacts. Nevertheless, the practical use of this technique depends on blind selection of appropriate model parameters.