A Sparse EEG-Informed fMRI Model for Hybrid EEG-fMRI Neurofeedback Prediction

Measures of brain activity through functional magnetic resonance imaging (fMRI) or electroencephalography (EEG), two complementary modalities, are ground solutions in the context of neurofeedback (NF) mechanisms for brain rehabilitation protocols. While NF-EEG (in which real-time neurofeedback scores are computed from EEG signals) has been explored for a very long time, NF-fMRI (in which real-time neurofeedback scores are computed from fMRI signals) appeared more recently and provides more robust results and more specific brain training. Using fMRI and EEG simultaneously for bi-modal neurofeedback sessions (NF-EEG-fMRI, in which real-time neurofeedback scores are computed from fMRI and EEG) is very promising for the design of brain rehabilitation protocols. However, fMRI is cumbersome and more exhausting for patients. The original contribution of this paper concerns the prediction of bi-modal NF scores from EEG recordings only, using a training phase where EEG signals as well as the NF-EEG and NF-fMRI scores are available. We propose a sparse regression model able to exploit EEG only to predict NF-fMRI or NF-EEG-fMRI in motor imagery tasks. We compared different NF-predictors stemming from the proposed model. We showed that predicting NF-fMRI scores from EEG signals adds information to NF-EEG scores and significantly improves the correlation with bi-modal NF sessions compared to classical NF-EEG scores.


INTRODUCTION
Neurofeedback approaches (NF) provide real-time feedback to a subject about his or her brain activity and help him or her perform a given task (Hammond, 2011;Sulzer et al., 2013). The estimation of neurofeedback information is done through online brain functional feature extraction to provide this real-time feedback to the subject. NF appears to be an interesting approach for clinical purposes, for example, in the context of rehabilitation and psychiatric disorders (Birbaumer et al., 2009;Sulzer et al., 2013;Wang et al., 2017). Functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) are the most used non-invasive functional brain imaging modalities in neurofeedback. EEG measures the electrical activity of the brain through electrodes located on the scalp. EEG has an excellent temporal resolution (milliseconds) but a limited spatial resolution (centimeters), implying a lack of specificity. Furthermore, source localization in EEG is a well-known ill-posed inverse problem (Grech et al., 2008).
On the other hand, blood oxygenation level-dependent (BOLD) fMRI measures a delayed hemodynamic response to neural activity with a good spatial resolution and a temporal resolution of 1 or 2 s depending on the sequence used. Therefore, fMRI is more specific than EEG, making fMRI a suitable modality for neurofeedback (NF-fMRI) (Thibault et al., 2018). However, the use of the MRI scanner is costly, exhausting for patients (since staying perfectly still when suffering is challenging), and timeconsuming, and hence NF-fMRI sessions cannot be repeated an excessive number of times for the same subject or patient.
During the past few years, simultaneous EEG-fMRI recording has been used to understand the links between EEG and fMRI in different states of brain activity and has received recognition as a promising multi-modal measurement of brain activity (Abreu et al., 2018;Perronnet et al., 2018). However, this bi-modal acquisition is cumbersome for subjects or patients due to the use of the fMRI scanner. The methodology to extract information from fMRI with EEG has been also intensively investigated (some methods involved in the process are reviewed in Abreu et al., 2018). Indeed, both modalities are sensitive to different aspects of brain activity with different speeds. EEG provides a direct measure of the changes in electrical potential occurring in the brain in real time, while fMRI indirectly estimates brain activity by measuring changes in the BOLD signal, reflecting neurovascular activity, which occurs, in general, a few seconds after a neural event (Friston et al., 1994;Logothetis et al., 2001). Several studies have investigated correlations between EEG signal and BOLD activity in specific and simple tasks (de Munck et al., 2007;Goncalves et al., 2008;Scheeringa et al., 2011;Engell et al., 2012;Magri et al., 2012) and have found different relationships of certain frequency bands of the EEG signal. All those studies reveal the existence of a link between EEG and fMRI, but this relationship varies considerably with the task, location in the brain, and frequency bands considered.
In the literature, the term EEG-informed fMRI refers to methods extracting features from EEG signals in order to derive a predictor of the associated BOLD signal in the region of interest under study. A recent review (Abreu et al., 2018) gives a good overview of the principal EEG-informed fMRI methods and their limitations. Different strategies have been investigated, depending on the type of activity under study (epilepsy, resting state, open/closed eyes, relaxation): either selecting one channel of interest or using multiple channels before extracting features of interest. For example, in Formaggio et al. (2011) and Leite et al. (2013), the authors used a temporal independent component analysis to select the channel that best reflected epileptic seizures. In Schwab et al. (2015), the authors used a spatial, spectral, and temporal decomposition of the EEG signals to map EEG onto BOLD signal changes in the thalamus. Using a more symmetrical approach, Noorzadeh et al. (2017) proposed a method for the estimation of brain source activation, improving the spatio-temporal resolution compared to EEG or BOLD fMRI only. However, in the context of neurofeedback, using simultaneous recording of EEG-fMRI to estimate neurofeedback scores computed in real time from features of both modalities (NF-EEG-fMRI) is a recent application that was first introduced and its feasibility demonstrated by Zotev et al. (2014. The recent methodology synchronizing both signals for real-time neurofeedback  allows the construction of a new kind of data named NF-EEG-fMRI data, such as the dataset presented by Perronnet et al. (2017), which we used for the present study. Furthermore, it has been shown in Perronnet et al. (2017) that the quality of a neurofeedback session is improved when using both modalities simultaneously in NF-EEG-fMRI sessions. Thus, being able to reproduce an NF-EEG-fMRI session in real time when using EEG only would reduce the use of fMRI in neurofeedback while increasing the quality of NF-EEG sessions. To export fMRI information outside the scanner, most of the methods intend to predict the fMRI BOLD signal activity in a specific region of interest by learning from an EEG signal recorded simultaneously inside the fMRI scanner. Indeed, the method proposed in Meir-Hasson et al. (2014) uses a ridge regression model with a ℓ 2 regularization, based on a time/frequency/delay representation of the EEG signal from a single channel. Results show a good estimation of the BOLD signal in the region of interest, but the use of the fMRI neurofeedback in this study is only to serve the paradigm. The method aims at better targeting the amygdala in NF-EEG sessions.
Our challenge here is to learn EEG activation patterns (see section 2.2) from hybrid (or bi-modal) NF-EEG-fMRI sessions  and improve the correlation with NF-EEG-fMRI of NF scores using the EEG signal only. The motivations of this are multiple. Since we are considering a new kind of data, we want to provide a simple method of characterizing NF-EEG-fMRI in EEG, leading to an understandable model to confirm existing relations between EEG and fMRI in neurofeedback scores or to discover new relationships. Neurofeedback features in fMRI come from BOLD activation in one or more regions of interest. We propose an original alternative to source reconstruction in the context of neurofeedback by taking advantage of a recent and unique dataset of NF-EEG-fMRI that we have access to. Indeed, we intend to predict NF scores directly without dealing with source reconstruction or spatial filtering to estimate the BOLD-fMRI signal first in a specific region of interest, as proposed by a previous study . To our knowledge, this prediction of hybrid neurofeedback scores (without source reconstruction) is new and has not yet been explored in the literature. Also, we want the activation pattern to be applicable in real time when using new EEG data. The main objective of this paper (Figure 1) is to design a method able to exploit EEG only and predict an NF score of a quality comparable to the NF score that could be achieved with a simultaneous NF-EEG-fMRI session. The approach is based on a machine learning mechanism. During a training phase, both EEG and fMRI are simultaneously acquired and used to compute and synchronize, in real time, NF-EEG and NF-fMRI scores, both being combined into a hybrid NF-EEG-fMRI score . EEG signals and NF scores are used to learn activation patterns. During the testing phase, the learned NF-predictor (also called the activation pattern) is applied to unseen EEG data, providing simulated NF-EEG-fMRI scores in real time. Sparse regularization is exploited to build a model called the FIGURE 1 | Objective: We propose a method of learning a NF-predictor from bi-modal neurofeedback sessions (NF-EEG-fMRI) (see Perronnet et al., 2017 or section 3 for more details). The final goal of this method is to propose NF sessions using EEG only that have the quality of bi-modal NF sessions, therefore reducing the use of fMRI.
NF-predictor. The model used for the NF-predictor uses an adapted prior for brain activation patterns, using a mixed norm giving a structured sparsity, to spatially select electrodes and then select the most relevant frequency bands.
In section 2, we present the proposed model and the methods used to solve it. We then detail experiments with our learning model in neurofeedback sessions with a motor imagery task, from which unique data are presented in section 3. Section 4 presents results for a dataset of 17 healthy subjects with three NF sessions of motor imagery each; one is used to learn the model, and the other two are used to test the model. Section 5 provides a discussion of the proposed framework.

PROBLEM AND METHOD
Considering that, during a learning phase, we have access to reference scores y(t) and a temporal representation (potentially non-linear) of EEG signals X (called a design matrix, presented in section 2.1), the approach consists of choosing a vector of parameters α such that y(t) ≈ q(X(t), α) for all t, where q is some parametric function. α is a matrix matching the size of X(t). Here we consider To determine this parameter α, a regularization is used to select an optimal parameter α that fits the training data (see Figure 2 and section 2.3) while avoiding over-fitting, as detailed in section 2.2. Only a few brain regions are expected to be activated by a given cognitive task, so the electrode configuration is said to be spatially sparse. However, the frequency bands of each electrode are not necessarily sparse and might even be smooth, depending on the frequency band sampling.
From here, we will use the following notations: • y e (t) ∈ R, ∀t ∈ {1, ..., T} are the T neurofeedback scores estimated from EEG signals (noted S EEG ∈ R E×T EEG ) measured from E electrodes during T EEG samples of time. • y f (t) ∈ R, ∀t ∈ {1, ..., T} are the T neurofeedback scores extracted from the Blood Oxygen Level Dependent imaging (BOLD) signal of functional-MRI acquisitions S fMRI ∈ R V×T fMRI , with V the number of voxels and T fMRI the number of acquired volumes. • y c (t) = y e (t) + y f (t) ∈ R, ∀t ∈ {1, ..., T}, a combination of both NF scores (more details are provided in section 3). • y(t) ∈ R, ∀t ∈ {1, ..., T} is a set of neurofeedback scores that can be y e , y f , or y c .
First, to build our predictor, relevant information needs to be extracted from EEG data and organized to form what we call a design matrix.

Structured Design Matrices From the EEG Signal
The design matrix X 0 ∈ R T×E×B , where E is the number of electrodes and B the number of frequency bands, contains relevant information extracted from the EEG signal. Each temporal matrix of X 0 , X 0 (t) ∈ R E×B ∀t ∈ {1; . . . ; T} is a frequency decomposition corresponding to the past 2 s of S EEG . We used a Hamming time window of 2 s to estimate the average power of each frequency band b ∈ {1; . . . ; B} (defined below) on each channel ∈ {1; . . . ; E}. Each time window of the EEG signal overlapped by 1.75 s (0.25-s shift) to match with the 4 Hz sample of the y values. The B frequency bands have an overlap of 1 Hz with the next band and are defined between a minimum frequency b min Hz and a maximum frequency b max Hz (see section 3). We chose to use several relatively narrow frequency bands to let the model select the relevant bands for each electrode. Furthermore, it has been suggested (de Munck et al., 2009;Rosa et al., 2010) that different frequency bands be used when working with coupled EEG-fMRI data.
FIGURE 2 | Machine learning scheme. For each subject, a bimodal neurofeedback session (NF-EEG-fMRI session 1 here) is used for the learning step, and then the learned activation patternα is applied to the other sessions (2 and 3) for the testing step. The learning data are split K times into a training set (90% of the learning set) and a cross-validation (CV) set (10% of the learning set). The optimalλ parameter is the one minimizing the variance and the bias in the learning step.
The model also has to be able to predict y f scores derived from the BOLD signal (see section 3). There is no linear relationship between the BOLD signal and the average power on frequency bands from the EEG signal. Therefore, to better match y f scores, we decided to apply a non-linear function to X 0 used in fMRI to model BOLD signals (Lindquist et al., 2009;Pedregosa et al., 2013), the canonical Hemodynamic Response function (HRF). We convolved X 0 on its temporal dimension with the HRF, formed by two gamma functions, for a given delay of the first gamma function to compensate the response time of the BOLD signal, as suggested in Moosmann et al. (2008) and Meir-Hasson et al. (2014). The HRF will temporally smooth and give a BOLD-like shape to the design matrix and increase a potential linear relationship between y f and the design matrix. Since HRF is known to vary considerably across brain regions and subjects (Handwerker et al., 2004), it is recommended to consider different delays but also to chose a range of values corresponding to the task asked. For the type of task addressed in the experimental part, the observed delay is around 4 s, so we convolved X0 with three different HRFs, leading to three new design matrices, X 3 , X 4 , and X 5 , with respectively peak locations of 3, 4, and 5 s. We started from the canonical HRF and changed the peak parameter (respectively using 3, 4, and 5) to induce the three different HRFs used to induce a time delay in the initial design matrix X 0 . By doing so, we kept the total of design matrices to a reasonable size.
These design matrices are concatenated in their 2nd dimension to form the X c ∈ R T×M×B matrix, with M = 4 * E.

Optimization
The EEG data are now represented by a structured design matrix X c , and we can search for a weight matrixα ∈ R M×B , such that m,hα (m, h)X c (t, m, h) estimates the NF score y(t), ∀t ∈ {1; . . . ; T} as well as possible. Note: the methodology is presented for the design matrix X c but can be used for X 0 or X d .
To identifyα, called the activation pattern, we propose the following strategy, which consists of learning, for a given subject and NF session, the optimalα by solving the following problem: with φ λ a regularization term, and λ a weighting parameter for the regularization term. Thisα is then applied to a design matrix X test c from a new session to predict its NF scores.
Equation 2 is of the form argmin(g 1 (α)+g 2 (α)), and its resolution can be performed using the Fast Iterative Shrinkage Thresholding Algorithm (FISTA) (Beck and Teboulle, 2009), which is a twostep approach of the Forward-Backward algorithm (Combettes and Wajs, 2005), making it faster. FISTA requires the same conditions as the Forward-Backward algorithm, meaning a convex differentiable with Lipschitz gradient term g 1 and a convex term g 2 that is not necessarily differentiable but smooth enough to make its proximal map computable.
Here, g 1 (α) = T t 1 2 (y(t) − q(X c (t), α)) 2 is a sum of convex and differentiable functions with . By representing X c (t) and α as vectors of size M * B, we can easily note that ∂g 1 ∂α is a sum of Lipschitz functions. Therefore, the Lipschitz constant of The NF-predictor uses the structured design matrix to achieve better control over the interpretation of results and to better optimize the weightsα. Therefore, we have to adopt an optimization strategy that is coherent with this structure. The activation pattern of the NF-predictor: 1. has to be spatially sparse to regulate the model, as EEG signals are noisy, and to select the most relevant electrodes on each frequency bands, 2. has to be smooth across different overlapped frequency bands, 3. has to allow non-relevant frequency bands to be null.
with ρ ∈ R + and λ ∈ R + . We chose not to estimate the parameter ρ to keep computation time reasonable. Indeed, ρ weights the induced spatial sparsity over EEG channels, and we chose to fix this parameter for all subjects, as we hypothesize that there is no reason for the number of electrodes involved in the activation pattern to significantly change between subjects. However, the estimation of the λ parameter is needed (since we do not have a hypothesis on its behavior) and is presented in the next section. The ℓ 21 mixed norm that writes α 21 = m b α 2 m,b satisfies conditions (1) and (2). The ℓ 1 norm defined as α 1 = m,b |α m,b | satisfies condition (3) since ℓ p norms with p ≤ 1 are known to promote sparsity. The last key point of FISTA algorithm is the proximal map associated to the with operator (.) + = max(., 0). One can note that by canceling either the λ parameter or the ρ parameter, we retrieve the proximal map associated with the ℓ 21 (when ρ = 0) and the ℓ 1 (when λ = 0) norms, demonstrations of which can be found in the appendix of Gramfort et al. (2012). For the stopping criterion of FISTA, a large enough number of iterations has been used to allow the model to converge before reaching the last iteration. All elements and conditions are gathered to run the FISTA algorithm.

λ Parameter Selection
The parameter λ is important in the optimization problem, and we decided to estimate it automatically. The following process chooses the best λ among a list of = {λ 1 ; ...; λ l } sorted in increasing order. First of all, the data must be split into two sets. In our case, subjects have three NF-EEG-fMRI sessions: one session is used as the learning set, and the other two NF-EEG-fMRI sessions are used as the testing sets (see Figure 2). For each value λ i of , the learning set, formed by T neurofeedback scores with their associated design matrices, is divided K = 50 times into a training set of indices R k , representing 90% of the T data, and a cross-validation set CV k composed of the remaining 10% of the learning set. A modelα (k,i) is estimated on the training dataset k composed by R k neurofeedback scores y(j) and the associated design matrices X c (j) with λ i , i.e.: with j ∈ R k and X c (j) ∈ R M×B . For the current λ i evaluation, we stop the process when k |α (k,i) | 0 /K < 2. There is no need to investigate the next λ i ; the current one is sparse enough, and the next one might lead to null models. We then apply the modelα (k,i) to the corresponding crossvalidation set of CV k NF scores to obtain estimated values of y(s),ỹ(s) = q(X c (s),α (k,i) ) with s ∈ CV k . For each one of the 50 partitioned into training and cross-validation sets, we computed the normalized mean squared error NMSE for a given set of data {y, X c } for the training sets and the cross-validation sets. NMSE({y, X c },α (k,i) ) = s (y(s) − q(X c (s),α (k,i) )) 2 s (y(s) −ȳ) 2 (8) withȳ = 1/n n s y(s). The optimalλ parameter is defined as the one minimizing the error during training and crossvalidation. Considering that only the errors from the training set NMSE({y(R k ), X c (R k )},α (k,i) ) would introduce bias, and considering only the error of the cross-validation set NMSE({y(CV k ), X c (CV k )},α (k,i) ) would introduce variance. Then, the optimalλ is the λ i that minimizes Theλ parameter is the optimal parameter used for the model estimation. If there are several candidates, to favor sparsity, the larger of these candidates is chosen.

DATA ACQUISITION AND PRE-PROCESSING
We used an existing dataset presented in Perronnet et al. (2018) composed of 17 healthy subjects who were scanned using the hybrid Neurofeedback platform from Neurinfo (Rennes, France) coupling EEG and fMRI signals .
Data are now available online in BIDs format on OpenNeuro: https://openneuro.org/datasets/ds002338 (Lioi et al., 2019). A 64-channel MR-compatible EEG solution from Brain Products (Brain Products GmbH, Gilching, Germany) was used, the signal was sampled at 5 kHz, and FCz was the reference electrode and AFz the ground electrode. For the fMRI scanner, we used a 3T Verio Siemens scanner with a 12-channel head coil (repetition time (TR)/echo time (TE) = 2,000/23 ms, FOV = 210 × 210 mm 2 , voxel size = 2 × 2 × 4 mm 3 , matrix size =105 × 105 with 16 slices, flip angle = 90 • ). All subjects were healthy volunteers, were right-handed, and had never taken part in any neurofeedback experiments before. They all gave written informed consent in accordance with the Declaration of Helsinki, as specified in the study presenting the data . They all underwent three NF motor imagery sessions of 320 s each after a session dedicated to the calibration. One session consists of eight blocks alternating between 20 s of rest, eyes open, and 20 s of motor imagery of the right hand. The neurofeedback display was uni-dimensional (1D) for nine subjects (Figure 3 left) and bidimensional (2D) for eight subjects (Figure 3 middle). For both, the goal was to bring the ball into the dark blue area (Perronnet et al., 2018). NF scores y e and y f , being from different modalities, were standardized before summing to form y c . In this study, NF scores refer to standardized scores except when they are predicted. For this study, y e have been computed from the Laplacian operator (commonly used in neurofeedback) centered around the region of interest, channel C3 here. For each time interval I t , the spatial filtering is noted as Lap(C3, I t ). The temporal segments I t are spaced at 250 ms apart and have a length of 2 s (therefore, there is an overlap of 1.75 s), as in the design matrix construction. The power of the frequency band [8-30 Hz] is then extracted via the Power Spectral Density function PSD: One may note the presence of the minus operator, which is used here for the sake of coherence with y f (Figure 3 right).
The neurofeedback scores y f have been computed from the maximal intensity of BOLD signal covering the right-hand motor area and the supplementary motor area. One score is computed per volume acquired (i.e., 1 per second). Scores y f are then resampled and smoothed (using a Savitzky-Golay filter, known to avoid signal distortion) to fit the 4 Hz y e scores (T = 1, 280).
An active set has been selected on the design matrices to avoid potentially correlated noise due to head movement during resting blocks obstructing the signal from channels of interest. Indeed in coupled EEG-fMRI acquisitions, subjects are lying in the MRI scanner, so the outer electrodes may be in contact with the bed or holds. We excluded the outer electrodes and kept 28 electrodes: the three central lines have seven electrodes (FCz is the reference), three frontal electrodes, and three posterior electrodes.
Potential outliers in the design matrices (i.e., observations > mean ± 3std) were thresholded in the NF-EEG-fMRI session used as the learning set, and bad observations from annotations on the EEG signal were removed as their corresponding NF scores. For the frequency dimension of the design matrix X 0 construction (cf. section 2) and therefore for the other design matrices, we chose b min = 8, b max = 30 to cover the alpha and beta frequency bands involved in motor tasks. We considered B = 10 frequency bands, leading to bands of 3 Hz wide with an overlap of 1Hz.
As mentioned in the previous section, for the regularization of the NF-predictor, we used 15 values of λ i from 100 to 3,000 and fixed the parameter ρ = 1, 500.

EXPERIMENTS AND RESULTS
As stated in the previous section, there were three neurofeedback sessions per subject. For each subject, we will consider one session as a learning set, and the two others as testing sets. Thus, there are three different learning sets and six different testing sets per subject.
The advantage of the last NF-predictor is the possibility of using the 2D score visualization (Figure 3) to display NF scores. Also in this case, only NF-fMRI scores have to be learned. We run the following experiments to test our different NF-predictors: -Model validation: we used the learning set to assess whether the different NF-predictors could accurately model the NF scores. For each subject and each NF-predictor, we estimated correlations with the reference NF score to quantify the quality of prediction. As a reference, we compared to the correlation of y e to y c , which is part of the y c score, to assess whether, in the validation process, the model can better predict y c scores than y e . -Model prediction: we applied the learned activation patterns to a new NF-EEG-fMRI recorded session (testing set, Figure 2). For each subject and each pair of sessions (six learning/testing pairs per subject), we compared correlations between NFpredictorsỹ α c and y e +ỹ α f to reference score y c and between the prediction ofỹ α f to y f . -To observe the captured structure in the activation patternsα c andα f , respectively learned to predict y c and y f , we re-shaped the average activation patterns of a subject over sessions into FIGURE 3 | Bi-modal neurofeedback metaphors (1D on the left, 2D on the middle) displayed during sessions (Perronnet et al., 2018). 1D: the ball's position represents the sum of y e and y f . 2D: the left axis represents the y e , and the right axis represents the y f scores. The two plots on the right show NF scores from EEG and from fMRI; green areas are task and white areas are rest. The goal is to bring the ball into the dark blue area.

FIGURE 4 | Model validation and prediction. Boxplots (median and quartiles)
of Pearson's correlation coefficients, over all subjects and sessions, between NF-predictors and y c or y f . The right part of the plot indicates the correlation of the reference NF scores y e and y f vs. the corresponding reference bi-modal NF scores y c (= y e + y f ).
matrices corresponding to the design matrices defining X c , respectively X d (see section 2.1) and displayed the results of the first dimension (electrodes) and of the second dimension (frequency bands).

Model and Prediction
The results of the model validation of the NF-predictors (i.e., the learned activation patternsα are applied to the learning set) are shown on the left side of Figure 4. Pearson's correlation coefficients between the prediction and the ground truth are computed for each of the three learning sessions and for all subjects. Correlations are very high (> 0.8) for the two NF-predictors of y c scores, indicating that the model

The table shows p-values of a t-test in which the hypothesis is: models in columns correlate better with y c than models in rows.
is adapted to the problem of NF prediction. Indeed, the model could fit NF-fMRI scores using only EEG signal information with a median correlation of r = 0.81 with y f . For the evaluation of the model prediction, the learned activation patterns are applied to the testing sets, i.e., the two other unseen NF sessions for each one of the three NF sessions. The significance of the differences between the correlations to the reference NF scores is assessed by applying a Fisher transformation to the correlation coefficients. The p-values obtained via Student's t-test after this transformation indicate the significance of the differences between the correlation coefficient distributions. Table 1 shows that y e +ỹ α f , whose median correlation is r = 0.74, better predicts y c than y e only, despite the fact that y e is part of y c . The paired t-test gives a p-value p = 6.6e-4 (t = 3.52), meaning that the prediction of NF-fMRI scores byỹ α f significantly adds information to y e . Predicting y c scores with y e +ỹ α f is also better thanỹ α c (p = 2e-42, t = 23.17), which is expected since the models predicts NF-fMRI scores and directly uses NF-EEG scores (with all the potential noise coming from the EEG measures), which are part of the reference score, as y e is in y e +ỹ α f , and therefore correlates well with y c . Furthermore, the proposed model is able to predict y f scores with a fair correlation of 0.36 in the median and 0.35 in the average. Thus, predicting NF-fMRI scores instead of predicting bi-modal NF scores seems to be the best way of predicting those bi-modal NF scores. Figure 5 gives examples of predictions of y c with y e +ỹ α f andỹ α c . It also illustrates that, even if the correlations ofỹ α c are lower than y e +ỹ α f ( Table 2),ỹ α c can correctly predict the reference score y c .
During the learning session (i.e., the NF session used to learn the predictor), subjects might unevenly focus on their fMRI or EEG feedbacks, and when subjects focus more on NF-fMRI than on NF-EEG, the EEG-signals might lose coherence with respect to the NF-fMRI scores. Despite this, the EEG signals could predict NF-fMRI scores with a correlation of 0.36 in median and 0.35 in mean (cf Figure 4, which is a fair correlation between such different modalities. Examples of NF prediction are given at the bottom of Figure 5, the plot shows the predictionỹ α f of NF-fMRI scores on a testing set.

Activation Patterns
We now focus on the learned models to evaluate the sparsity over sessions and subjects and observe the dispersion of the learned patterns over the sessions and frequency bands of a subject. For each subject and each learning session, we computed the proportion of zero coefficients of the activation patterns ( Table 2). By construction of the design matrix X c ,α c , andα f can be split into different activation patterns, as shown in the columns of Table 2. The models could select relevant coefficients of the design matrices to predict reference NF scores. In average, there are 87 non-zero coefficients onα c to predict y c and 57 non-zero coefficients onα f to predict y f . To display the activation patterns of a subject over their three sessions, we denoteζ c = 3 s=1 |α (s) c | ∈ R M×B the absolute activation pattern ofα c (respectively,ζ . We can display heat maps for each of the four absolute activation patterns at each electrode b∈B ζ 0 , b ζ 3 , b ζ 4 , and b ζ 5 (Figure 6, top line of each panel) and color maps of the four average patterns b γ 0 , b γ 3 , b γ 4 , and b γ 5 (Figure 6, bottom line of each panel) showing the sign of the strongest and most stable coefficients across all subjects and sessions. The activation patterns displayed at Figure 6 represent the dispersion of the learned parameters for one example subject over their three bimodal NF sessions for which they received a bi-dimensional display. The subject used had a median correlation with the corresponding reference score of 0.44 forỹ α f , of 0.76 forỹ α c , and of 0.81 for y e +ỹ α f . All maps present different distributions of the non-zero coefficients. As expected when learning y c scores, the most intense heat map b ζ 0 with a maximum value of 45 concentrates its non-zero values on the C3 channel (above the right-hand motor area) and corresponds to the design matrix X 0 , directly extracted from the EEG signal without non-linear temporal delay. The next heat map in intensity order is b ζ 5 with two peaks with a parietal positioning above the Pz and P4 channels, corresponding to Brodmann area 7, which is involved in visuo-motor coordination. As presented in an interesting study (Sitaram et al., 2017), this brain area is active during generalized neurofeedback when feedback is presented visually. In Sitaram et al. (2017), it is also mentioned that this part of the cortex is part of the executive control network connected to the thalamus. The activation of the executive control network during motor imagery tasks is relevant and indicates that the subject is trying to do the task. It is also interesting to observe that the main activation peaks of b ζ 5 (Pz and P4) have opposite signs to the activation peak of b ζ 0 (above C3), suggesting a negative correlation between neural activation measured at C3 and the neural activation of the posterior parietal Pz and P4 channels, part of the control network in neurofeedback. Figure 6B presents comparable learned activation patterns to predictỹ α f , but more spread over sessions than for the model on panel A, even if the model has around 93% of null coefficients ( Table 2). This suggests that, even for a single subject, the relation between EEG and fMRI changes over sessions, probably depending on the strategy used by the subject.
At last, it is also possible to display the frequency profile of each average activation patterns. The 4 frequency profiles are e∈{1,...,E} γ 0 , e γ 3 , e γ 4 and e γ 5 , e ∈ {1, ..., E}, summing weights over electrodes. Figure 7 shows that the most used frequency bands over all sessions and electrodes are the alpha band ([8-12] Hz) and lower beta band ([13-17] Hz); the last four frequency bands are not displayed as they only have null coefficients. We can observe the effect of ℓ 2 regularization, which allows continuity in the frequency bands, and the effect of the subsequent ℓ 1 regularization, which removed the smaller coefficients located in the high frequencies. When considering all activation patterns (Figure 7 right side), there is a change of sign between alpha and lower beta. Each activation pattern shows a different frequency profile, which, together with Figure 6, tends to demonstrate that patterns have complementary information.

Ablation Study
We run an ablation study to understand the impact of the non-linearity part of the model, obtained by the convolution of different HRFs inducing different delays for the prediction of NF-fMRI scores, by analyzing the results of the model without HRF convolutions. Therefore, in this ablation study, we used the exact same processes as described in section 4 (with a learning step and a testing step), except that we used the design matrix X 0 alone to learn the model parameters to predict y c scores and y f scores.
The results of the ablation study are displayed at Figure 8 and are to be compared with the results of the proposed model presented at Figure 4. Figure 8 shows that during the learning step, the prediction ofỹ α f using X 0 only correlates with y f with a correlation of only 0.48 in mean and median. This prediction is significantly improved by the use of the different HRF functions (paired t-test, t = −14, 64, p = 1e-26), as, in the proposed model, y α f correlates with y f with a correlation of 0.81 in median and mean. Therefore, it is natural to observe that the prediction of the bi-modal NF score by y e +ỹ α f is also significantly improved by the use of non-linearity (paired t-test, t = −11, 87, p = 4e-21). Figure 8 shows that, for the testing step, the prediction of y α f using X 0 only correlates with y f with a correlation of 0.14 in mean and 0.15 in median, which is significantly lower than with the proposed model (paired t-test, t = −9.56, p = 2e-18). The prediction of the bi-modal NF scores by y e +ỹ α f is also significantly lower than with the proposed model (paired t-test, t = −2.74, p = 3e-3). Also, when using X 0 only, the prediction of the bi-modal NF scores using y e +ỹ α f does not significantly improve over y e alone (paired t-test, t = −0.69, p = 0.5).

DISCUSSION AND CONCLUSION
The model validation supports that the optimization strategy we chose for our problem is adapted to the model, as is the choice of  the different design matrices. The ablation study supports the use of different non-linear delays to improve the prediction of NF-fMRI scores using the EEG signal. The evaluation of the model prediction strongly suggests that predicting only NF-fMRI scores from an EEG signal while applying a Laplacian on the EEG signal appears to be the best solution. Indeed, forỹ α c , when predicting the bi-modal NF scores, the variability between EEG signals induces a decreasing correlation with the reference NF score y c . Also, NF-EEG scores can always be computed from available EEG signals, which leaves only NF-fMRI scores to be estimated from EEG signals. However, one might want to improve the selection of features for the computation of NF-EEG scores, but this raises different questions about the validation and the reference score. Here, we assumed that the given NF scores are relevant to the task and good enough to be considered as reference scores. As expected, for the prediction of the bi-modal NF scores byỹ α c , when decomposing the activation patternα c (fromỹ α c ) into the four matrices corresponding to the different design matrices, the weights corresponding to X 0 (0 s of delay) are mainly located above C3, which is the center of the Laplacian used for the computation of the NF-EEG scores. This seems to support the fact that with a delay of 0 s, only the part of the NF coming from the EEG provides information.
A possibility to improve the prediction of NF-fMRI scores would be to use more NF sessions as learning sessions, since, as observed, EEG signals bring variability into the prediction. Each new bi-modal neurofeedback session could be added to the subject-specific model to better adapt the model to the subject or patient. This will be investigated in the next study. Given the improved correlation of the proposed NF predictor with bimodal NF scores, it would be interesting in future work to validate its improved performance in actual NF sessions compared to classical NF-EEG scores. In particular, to assess the response of subjects to the predicted bi-modal NF scores and particularly the predicted NF-fMRI scores learned by the proposed model over a standard NF-EEG neurofeedback session, a new and large enough study is needed, as subjects can learn at different paces to regulate their own brain activity.
Presently, the proposed model learns an individual or specific model for each subject, which allows a personalized model for adapted neurofeedback sessions. Also, a change in strategy for the task (here, no specific strategy for imagining moving their right hand was given to the subjects) might impact the learned model, as the relation between the EEG and fMRI signals may change. However, in a forthcoming work, we are investigating an adaptation of the methodology for the extraction of a common model, taking into account the differences between sessions and subjects, allowing the prediction of NF-EEG-fMRI scores on new subjects who did not participate in the model construction. The model might be less specific, but this would give access to neurofeedback sessions of bi-modal quality using EEG only for subjects with MRI contraindications and/or drive a subjectspecific model estimation, respecting the strategy used by the subject to progress in their neurofeedback task. The learned NF predictor is, of course, specific to the particular task considered during the learning session. One can expect to generalize the approach to other tasks where spatial sparsity is relevant. Its extension beyond such tasks is more challenging.
Other ways to improve the method proposed here would be to investigate the use of dynamic functional connectivity, a relatively recent field in BOLD fMRI, the use of which along with EEG data requires further investigation (Tagliazucchi and Laufs, 2015). Dynamic functional connectivity would allow the different networks (reward, control, and learning) involved during neurofeedback sessions to be taken into account (Sitaram et al., 2017). Dynamic functional connectivity studies the temporal fluctuations of the BOLD signal across the brain and appears to be a promising approach in the EEG-fMRI research field (Allen et al., 2018). However, one should be careful with the potential unknown remaining noise coming from the MRI during simultaneous EEG-fMRI recording, which might unsettle the EEG signal coherence between electrodes.
The long-term objective of our project is to learn from EEG-fMRI NF sessions to provide, outside the MRI scanner, enhanced NF-EEG sessions (Figure 1). Future work will investigate the portability of the learned model (on EEG-fMRI neurofeedback data), outside the MRI scanner, introducing new challenges, such as dealing with the remaining noise in the MRI after artifact correction and the absence of ground truth once the EEG is measured outside the MRI scanner.
To conclude, the model proposed here is able to provide a good enough prediction of the NF-fMRI scores to overcome the absence of NF-fMRI and allows the estimation of NF-EEG-fMRI scores when using EEG only to be significantly improved.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://openneuro.org/datasets/ds002338.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Comité de Protection des Personnes Ouest V Rennes. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
CC was the guarantor of integrity of the entire study. All authors contributed to the study concepts and design, data analysis and interpretation, and manuscript drafting or manuscript revision for important intellectual content. All authors gave approval of the final version of the submitted manuscript.