Intracranial brain-computer interface spelling using localized visual motion response

Intracranial brain-computer interfaces (BCIs) can assist severely disabled persons in text communication and environmental control with high precision and speed. Nevertheless, sustainable BCI implants require minimal invasiveness. One of the implantation strategies is to adopt localized and robust cortical activities to drive BCI communication and to make a precise presurgical planning. The visual motion response is a good candidate for inclusion in this strategy because of its focal activity over the middle temporal visual area (MT). Here, we developed an intracranial BCI for spelling, utilizing only three electrodes over the MT area. The best recording electrodes were decided by preoperative functional magnetic resonance imaging (MRI) localization of the MT, and local neural activities were further enhanced by diﬀerential rereferencing of these electrodes. The BCI spelling system was validated both oﬄine and online by ﬁve epilepsy patients, achieving the fastest speed of 62 bits/min, i.e., 12 characters/min. Moreover, the response patterns of dual-directional visual motion stimuli provided an additional dimension of BCI target encoding and paved the way for a higher information transfer rate of intracranial BCI spelling.


Introduction
In the age of social media and networking, text communication is becoming crucial to the quality of life of disabled persons.Brain-computer interfaces (BCIs) can facilitate text communication and environmental control in individuals suffering from severe neuromuscular diseases ( Anumanchipalli et al., 2019;Benabid et al., 2019;Birbaumer et al., 1999;Moses et al., 2021;Vansteensel et al., 2016 ).Despite the successful demonstration and clinical trial of invasive BCIs, there is still a lack of minimally invasive solutions.There are three levels of human electrical neural recording, providing different levels of signal-to-noise ratio (SNR) and invasiveness.Among them, electroencephalography (EEG) is a completely noninvasive method with a poor SNR ( Ball et al., 2009 ), whereas intracortical spike recording has the highest SNR with the most invasiveness ( Hochberg et al., 2006;Pandarinath et al., 2017;Velliste et al., 2008;Willett et al., 2021 ).The intracranial EEG (iEEG) is a tradeoff between invasiveness and neural signal SNR ( Anumanchipalli et al., 2019;Brunner et al., 2011;Nunez and Srinivasan, 2006;Parvizi and Kastner, 2018;Vansteensel et al., 2016 ).Recently, several BCI applications adopting high-density electrocorticography (ECoG) arrays have shown great capability of decoding spoken or imagined language Abbreviations: M, male; F, female; TL, temporal lobe; PL, parietal lobe; OL, occipital lobe; HC, hippocampus.Mackay and Rietveld, 1968 ).Although MT area differs from person to person ( Huang et al., 2019 ), there is still a spatial consistency between fMRI and EEG recorded visual motion response ( Bucher et al., 2006;Gaglianese et al., 2017 ).Thus, we propose that, by localizing the visual motion area with fMRI before surgical implantation of iEEG, sufficient electrical neural signals can be obtained with only a limited number of electrodes for a minimally invasive BCI.
In this study, we developed an online minimally invasive BCI and validated it with five epilepsy patients who had iEEG electrodes passing through the MT complex to localize seizure foci.An fMRI experiment before surgery was used to select the best electrode contacts for each patient ( Benabid et al., 2019;Rowald et al., 2022 ).Differential rereferencing was adopted to enhance both the event-related potentials (ERPs) and high-gamma features used for BCI classification ( Li et al., 2018;Verwoert et al., 2021 ).Combined with the dynamic stopping algorithm, we built an online visual motion response-based speller with the best performance of 62 bits/min (12 characters/min) online.While only very few contacts (three for each subject) were used, this speller's performance surpassed those of most classical visual motion paradigm-based BCI spellers ( Jin et al., 2012;Liu et al., 2010;Zhang et al., 2013 ), and comparable with most of iEEG-based BCI ( Anumanchipalli et al., 2019;Brunner et al., 2011;Vansteensel et al., 2016 ).This unique interfacing paradigm and processing pipeline paved the way for a new family of minimally invasive BCI systems.

Materials and methods
Five patients with intractable epilepsy (see Table 1 for additional information) were recruited.SEEG electrodes were temporarily implanted in the brain to localize seizure foci prior to surgical resection ( Fig. 1 C).To localize the individualized visual motion-related brain regions, an fMRI scanning was conducted using block design of bars moving randomly in four directions ( Fig. 1 A, Fig. S1).Afterward, CT volumes collected after SEEG implantation were registered with presurgical structural and functional MRI, aiming to select the top three contacts with the most prominent blood oxygen level-dependent (BOLD) responses to visual motion stimuli ( Fig. 1 B).The demonstration for electrode placement is shown in video 1.Then, differential rereferencing was employed to mitigate spontaneous noise and enhance visual motion responses ( Fig. 1 B, D) ( Li et al., 2018;Parvizi and Kastner, 2018 ).The attentional modulation effect of the visual motion response ( Fig. 1 G) was adopted to encode and decode user intent in the visual motion BCI paradigm ( Fig. 1 E).The attended stimuli could induce stronger ERP (1-20 Hz) ( Zhang et al., 2013 ) and high-gamma (60-140 Hz) ( Gaglianese et al., 2017 ) ( Fig. 1 G).With a smart stopping BCI algorithm, the target row and column attended by the user were decided; thus, the spelling character was generated.
The BCI experiments had two different sessions: the offline training and online testing sessions.The offline training sessions required the subject to attend to the assigned target ( "AHOV29 " from "A " to "9 " sequentially).The subject was cued to move eye gaze to the next target after each repetition of ten bar movements ( Fig. 1 E).Each subject underwent the training session for at least two times, one for model calibration and the other for evaluation.Subject S2 completed multiple online sessions, which required the subject to type a simple sentence (e.g., "HELLO WORLD "; an example copy typing video is included as video 2) or the whole target list on the screen (36 targets from "A " to "Z ", and from "0 " to "9 ").The target character determined by the BCI algorithm was printed on the screen sequentially.The subjects were asked to continue typing without correcting mistyped characters.

Participants
The participants were five patients with intractable epilepsy.SEEG electrodes were temporarily implanted to localize seizure foci before surgical resection.Each subject had normal or corrected-to-normal vision.Each patient had an 8-, 10-, or 12-contact SEEG electrode implanted in the left or right temporal lobe near the MT complex based on the clinical requirements ( Fig. 1 C).

Ethical statement
All subjects provided informed consent.This study was approved by the Institutional Review Board at both Tsinghua University and the PLA General Hospital.Electrodes were placed solely based on clinical considerations.

fMRI-based localization of the individualized visual motion area
The paradigm consisting of moving bars ( Fig. 1 A, Fig. S1) was adopted to identify the individualized visual motion area.Subjects were required to focus on the center of the screen, passively viewing the moving bar stimuli, with no task.The bars (5 bars, distance 1 • ) moved in four directions -left, right, up, or down -at a speed of 10 • ∕  within a fixed 5 • rectangle, changing direction every three seconds.After 21 s, the moving bars remained stationary for 21 s.The moving/stationary cycle was repeated seven times for each subject.The stimuli were displayed on an MRI-compatible LCD monitor.Patients lay supine in the bore of the MR scanner and viewed the display on a mirror angled at 45 • to allow them to see the monitor.
For comparison, a classical MT+ localization paradigm ( Huk et al., 2002 ) that used flying dot stimuli was also adopted.As shown in Fig. S2, 200 moving white dots with a diameter of 0 .25 • traveled toward or away from the center where a white fixation cross was located, at a speed of 6 • ∕  in a 15 • diameter circular aperture, changing direction once per second.The block design alternated between moving and stationary conditions with a period of 21 s, repeating seven times for each subject.

The visual motion BCI spelling paradigm
The stimuli were displayed on an LCD monitor (17-in., 1280 × 1024 , DELL FP1708, USA).The viewing distance was fixed at 50 cm.In addition, there were 6 × 6 virtual buttons on the visual motion speller interface ( Fig. 1 E), each with a size of 1 .73 • × 1 .73 • .The visual motion stimulus was a moving vertical bar that appeared (motion onset) at the right/left border of the virtual button (a fixed rectangle) and moved 1 .15 • toward the opposite side before it disappeared (motion offset).The direction was chosen based on which hemisphere the subject's electrodes were implanted in because the motion onset stimuli were projected to its contralateral hemisphere first via optic chiasm.The paradigm was a row/column paradigm.The row and column stimuli that contained the attended item elicited attended visual motion responses.Each stimulus was assigned a random color: red, green, blue, purple, yellow, or brown.The color was chosen to enhance the attention effect on the desired target as the subject was required to perform color recognition tasks during spelling ( Hong et al., 2009 ).
In the unidirectional paradigm, all stimuli moved in the same direction.A row/column stimulus was comprised of six stimuli in one single row/column.The complete series of motion stimuli in all rows and columns, defined as a trial, consisted of 12 epochs (six rowand six column-epochs).In the dual-directional paradigm, however, a row/column stimulus comprised 12 stimuli in two rows/columns with a distance fixed three lines apart, simultaneously moving in the opposite direction.The complete series of motion stimuli in all rows and columns consisted of six epochs (three row-and three column-epochs).The animated illustration is available at https://github.com/HongLabTHU/MI-BCI .

BCI experimental procedure
Each subject was required to participate in two offline sessions.Each session contained 60 trials in which the subject was instructed to focus on the virtual button "AHOV29 ", the diagonal items on the virtual keyboard.Each button was repeated for ten trials.Since each trial consisted of 12 trials with two target trials and ten nontarget trials, 120 target trials and 600 nontarget were performed in each training ses-sion.In addition, subjects S1, S2, and S3 were asked to participate in several online copy spelling tasks.The subjects were instructed to type an English sentence or a character list.The methods and experimental protocol were slightly different for subject S2 and subjects S1 and S3 ( Table 2 ).For S2, trials repeated until the speller chose a button as visual feedback; thus, the number of repetitions was dynamic (see Section 2.7.6 ).For S1 and S3, trials repeated for a fixed number of times (three times) before a character was chosen.The subject was told to type continuously without correcting.For subjects S1 and S3, one session was performed, asking the subjects to type 26 English letters in alphabetic order.Subject S2 performed five online sessions in two days.In the first session, the subject was asked to type 26 letters and ten digits (A-Z, 0-9).In the next four sessions, the subject was asked to spell the sentence "HELLOWORLD." The software for online stimulus display was developed with Python3 (Python Software Foundation) and PsychoPy ( Peirce, 2007 ).

SEEG/MRI data collection and registration
Intracranial recordings were made from depth electrodes implanted beneath the cortical surface at PLA General Hospital.The data were collected with the Blackrock NeuroPort system (Blackrock Microsystems, UT, USA).The implanted electrodes had 8, 10, or 12 contacts ( Fig. 1 C).The amplifier sampled data at 2000 Hz.Two contacts in the white matter were chosen as the ground and reference.Epileptic activity can cause a negative effect on the BCI tasks (Fig. S3).Therefore, acquisitions during active periods of epileptic activity were avoided if possible.Acquisitions were checked manually to ensure that there was no obvious epileptic activity in the blocks used for analysis.
The CT images were collected after surgical implantation and were registered to the presurgical MRI images with FreeSurfer ( Fischl, 2012 ), invoking SPM's mutual information-based transform algorithm ( Wells et al., 1996 ).Then, the registration was visually checked and adjusted manually if necessary.The locations of electrodes relative to the individual brains were obtained accordingly ( Figs. 1 B and 2 A).The electrodes were projected onto each individual's pial surfaces obtained with FreeSurfer to visualize the electrode contacts buried in sulci ( Fig. 2 B).

fMRI response comparison
The human visual motion area consists of multiple functional subregions.Thus, BOLD responses were compared within subjects S2-S5 to demonstrate the different functional regions involved in different MT localizers.As shown in Fig. S4A, the activation patterns of the moving bar and flying dot MT localizers for S2 were different, implying that different functional subregions were involved.To compare the grouplevel dissimilarity between the activation patterns of the two MT localizers, the cosine similarity of the activation patterns in the visual area, defined using the Brainnetome atlas ( Fan et al., 2016 ), were calculated (Fig. S4B).Additionally, the ratio of significant BOLD activation (  > 1 .5 ) that was located in the V5/MT+ complex were compared in each subject, showing that the activation was more concentrated in the MT area in the moving bar paradigm (Fig. S4C).

Best SEEG electrode selection
To identify the best electrode contacts that had the strongest visual motion response in SEEG-based BCI typing, presurgical fMRI BOLD activations around each electrode were evaluated.Functional images were modeled with one regressor for moving and stationary conditions using a generalized linear model (GLM) in FreeSurfer.The sig value of each voxel ( Eq. ( 1) ) was defined as the negative 10-based logarithm of the p value obtained from GLMs.
p denotes the t test probability of each voxel's BOLD activation in moving and stationary conditions.The sig value at each electrode contact was decided by its nearest voxel.The three adjacent contacts with the highest sig values were selected as "BEST " SEEG contacts for further BCI decoding.The contacts that are close to the pial surface ( < 3 .5 mm) were projected to their nearest vertices.The Brainnetome atlas were adopted to determine anatomical MT area of each individual.The contacts situated in the anatomical MT regions but with secondary activations (not among the best contacts) but not among the "BEST " contacts were marked as "SEC." Other contacts out of the anatomical MT were then marked as "OUT " ( Fig. 2 ).

Differential rereference
To better capture the local neural response and to prevent the common artifacts, differential rereferencing (bipolar and Laplacian rereferencing) was used in this study( Fig. 1 B, D).For bipolar rereferencing, each channel was rereferenced to its adjacent channel on the same electrode shaft.Laplacian rereference requires each channel to be rereferenced to the mean value of its two adjacent contacts along the SEEG electrode.For three adjacent electrodes A, B, and C, rereferencing was calculated as Eq. ( 2) .
The contacts on the border were mirrored so that Laplacian rereferencing was degraded to bipolar rereferencing.To simulate the minimally invasive scenario, only the three contacts in the "BEST " group were rereferenced.

ERPs and high-gamma power
The rereferenced SEEG data were filtered to 1-20 Hz (200-order FIR filter, processed in both forward and backward to zeroized phase shifting) to extract ERPs.The filtered data from 0 ms to 500 ms following each stimulus onset were then segmented into epochs.The frequency band of high-gamma responses was chosen based on cross-validations between 60-140 Hz and 60-90 Hz for each subject.The high-gamma response envelop was then extracted using a Morlet wavelet dictionary filter (the wavelet length equaled to five standard deviations of the implicit Gaussian kernel and the width scaled by its central frequency to obtain uniform frequency resolution across different bands), as implemented in MNE software ( Gramfort et al., 2014 ).Then, logarithms with base ten of the extracted responses were taken and the means were subtracted.Similarly, the high-gamma envelope from 0 to 500 ms after stimulus onset was then segmented into epochs and concatenated with the extracted ERP features for further classification.

BCI classification
The three SEEG contacts with the most prominent BOLD responses were selected as the "BEST " for BCI classifications.To discriminate between target and nontarget SEEG responses, an L2-regularized logistic regression classifier was trained using the low-frequency ERP features and the high-frequency high-gamma features for each subject.The first offline session of each subject was used as training data.Tenfold crossvalidations were performed for evaluations.Typing accuracy  ( ) was chosen as the metric and was estimated using the Eq. ( 3) based on the assumption that the target and nontarget responses were Gaussian and stable across trials ( Liu et al., 2020;Zhou et al., 2014 ).
(3) Fig. 2. Best SEEG electrode selection guided by BOLD activation.The SEEG contacts of each subject were divided into three groups: three adjacent contacts with the most prominent BOLD activations (BEST in dark blue), contacts that lie in the region of MT complex but with secondary BOLD activations (SEC in light blue), and other contacts outside the MT complex (OUT).(A) 2D slice visualization for the implanted SEEG electrodes.The sig values (see Eq. ( 1) ) of BOLD activations were superimposed on top of the structural MRI scan.Necessary rotations were applied to display all contacts on the same plane.(B) Electrodes projected on the inflated cortical surface.The contacts on the pial surface are displayed.The yellow contour outlines the V5/MT+ complex determined by the Brainnetome atlas ( Fan et al., 2016 ).The sig values of the BOLD activations were also overlaid onto the cortical surface.
(C) Bar plots of the sig (SEEG HG) of each channel (contact) for each subject.The error bars indicate the standard deviation.The red dashed lines indicate the sig (BOLD) of the voxel where each contact was situated.Broken channels were left blank (ch1 for S1, ch9 for S3, and ch7-10 for S4).Abbreviations: HG, highgamma.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) The Eq. ( 3) is based on the decision rule of the visual motion BCI paradigm: for correct discrimination in one single trial, all the classifier's outputs in the nontarget epochs are less than that in the target epoch ( Liu et al., 2020 ).In Eq. ( 3) , Θ denotes the target response features, which are assumed to be Gaussian, and Φ( ⋅) denotes the linear classifier adopted during evaluations. 1 ( ⋅) represents the probability density function of the target response's decision values, whereas  2 ( ⋅) represents the cumulative density function of the nontarget response's decision values.The target and nontarget decision values were still Gaussian because both of them were linear combinations of multiple Gaussian distributions.The parameters of these distributions were estimated using the maximum likelihood estimation (MLE).

Probability-based dynamic stopping algorithm
We employed a probability-based dynamic stopping strategy to weigh the required decision time and accuracy, optimizing the num-ber of averaging trials dynamically ( Liu et al., 2020 ).This algorithm calculated likelihood with the average of the classifier's outputs of the previous trials for each row/column using a softmax function ( Eq. ( 4) ).
denotes the likelihood that row/column a is the attended target after N trials of average.ȳ ( )  denotes the average of the classifier's outputs of row/column a after N trials of average.The likelihood, denoting as  ( ,  ) , that the target on row a and column b is the true attended one, is the product of that the target is in the  ℎ row and  ℎ column ( Eq. ( 5) ).

𝑏
(5) If one of the probabilities crosses a threshold or the number of averaging trials passed the preset limit, the algorithm ended the iteration and chose the target row/column with the highest likelihood.The character at the cross-section of the target row/column was selected as the final BCI typing output.

Offline evaluation
After training a classifier, we used another block of offline data for evaluation, using the same methods adopted in the real online scenario.We divided the ten repetitions of each character ( "AHOV29 ") into three subgroups, with a scheme of 3-3-4, respectively.ITR was chosen as the essential metric ( Eq. ( 6) ) ( Wolpaw et al., 2000 ).Here, N denotes all the possible choices, P denotes the correct rate of each choice, and T denotes the average time for each decision.A larger ITR requires both faster and more accurate decisions, and implies a more efficient BCI system.

fMRI activation predicted SEEG visual motion response
The location of the V5/MT+ complex has great intersubject variability ( Huang et al., 2019;Watson et al., 1993 ).The fMRI-based MT localizer can identify an individualized MT complex ( Huang et al., 2019;Huk et al., 2002 ).However, the V5/MT+ complex is thought to have multiple subregions related to different functionalities, e.g., retinotopic mapping ( Huk et al., 2002 ) and object selectivity ( Kourtzi et al., 2002 ).Usually, fMRI localizers use flying dots as stimuli, which elicit visual motion responses related to visual field changes ( Huk et al., 2002 ).To achieve the best visual motion related neural signal for BCI decoding, we adopted moving bar stimuli in our fMRI localizer comparable to that used in BCI spelling.The moving bar-and flying dot-localized MT regions were different, and the moving bar stimuli activated more voxels in the MT region than the flying dot stimuli (Fig. S4).
All five subjects had electrode contacts close to the fMRI-defined MT regions.The SEEG electrodes of each subject were localized by postsurgical CT scans ( Fig. 2 A).The SEEG electrodes were divided into three groups based on their BOLD activations and their positions relative to the anatomical MT complex.Nonoverlapping areas were observed between functional and structural MT complexes for S2 and S4, which highlighted the importance of individualized fMRI scanning to localize the best visual motion area for BCI spelling ( Fig. 2 B).The electrophysiological visual motion responses were compared with BOLD activations noted in the presurgical fMRI scans ( Fig. 2 C).The top three electrodes with the most prominent BOLD activations at the same time had the most prominent high-gamma responses (top 25% high-gamma features, Chi-Square test) among all available contacts in all five subjects, supporting that the electrodes selected based on presurgical fMRI data were the best candidates for BCI spelling.

Differential rereference enhances visual motion responses
Visual motion spellers often suffer from a low SNR, partly due to active large-scale spontaneous activities and weak task-specified signals.
In previous research, differential rereferencing, including bipolar and Laplacian, was demonstrated to be able to minimize interchannel correlation in SEEG ( Li et al., 2018 ), and thus might be helpful to mitigate large-scale noises across channels.Therefore, Laplacian and bipolar rereferencing was adopted to address the complexity of the intracranial electric field and enhance visual motion responses.For subjects S1, S2, S4, and S5, the effects were positive and were similar to the bipolar results.The significance of high-gamma features on the three "BEST " channels was enhanced in each individual ( Fig. 3 B).The decoding accuracies (36 targets, chance level 2.78%) had an average 9% increase under three times of average, with the best accuracy improvement being 13% (S4) ( Fig. 3 C).The negative effect of Laplacian or bipolar rereferencing on S3 ( Fig. 3 B, C) was caused by the exceptionally high interchannel correlation in the high-gamma band ( Fig. 3 A).Laplacian rereferencing enhanced the signal by mitigating the large-scale spontaneous activities of the brain.Nevertheless, in some situations, the signals were positively correlated between adjacent channels, either because of the complexity of the electrical field in the sulci or because of the interference between electrode contacts.

BCI classification: offline simulations and online testing
The enhanced signals were then filtered to extract ERP waveforms (1-20 Hz) and high-gamma envelopes (60-90 Hz or 60-140 Hz for different subjects based on cross-validation).The features extracted were used to train a logistic regression classifier for each subject to discriminate whether the response was an attended or unattended response.The attended targets were selected by choosing the row/column combinations with the highest confidence.In online sessions, a dynamic stopping algorithm was adopted to find a tradeoff between the decision time and accuracy ( Liu et al., 2020 ).The algorithm predicted a target and calculated its confidence for each trial until the confidence reached a predefined threshold.Offline experiments were conducted on five subjects as preliminary experiments.The offline task asking the subject to attend to "AHOV29 " on the diagonal of the interface sequentially was performed twice for each subject.One session was used to train the model, whereas the other was used for evaluation (see details in Section 2.7.7 ).The results are presented in Table 3 .S2-S5 had similar performances of 8-9 correct characters input per minute.S1 was the best subject, with the information transfer rate (ITR) reaching 88.4 bits/min, which was superior to the result achieved by most EEG-based classical visual motion spellers ( Hong et al., 2009;Jin et al., 2012;Liu et al., 2010 ), suggesting the potential of the current BCI system when high-quality signals are accessible.
Online experiments were performed with S1, S2, and S3.For S2, the experiments were performed on two different days for a total of five sessions.For S1 and S3, the experiments were conducted in one session for each subject.The results are presented in Table 4 .Subjects S1, S2, and S3 had analogous results for alphabet typing sessions.In addition, S2 made steady progress during these tasks, from 7 to more than Fig. 3. Effects of Laplacian and bipolar rereferencing for each subject.The middle channel was rereferenced to its two adjacent channels (Laplacian), whereas the two other channels were rereferenced to the middle channel (Bipolar).(A) Correlation between channels on the high-gamma band for each subject.Colored rectangles mark the "BEST " channels used in the BCI application.(B) Sig (HG) of the "BEST " channels.(C) The estimated typing accuracies (chance level: 1/36, see Section 3.3 ) under different numbers of averages.10 correct characters per minute.On the second day (sessions 4 and 5), the subject achieved the highest result of 61.5 bits/min, equivalent to 12 correct characters per minute.Here, we demonstrated a satisfactory performance among the intracranial EEG-based BCI typing systems, with an average ITR of 38.7 bits/min ( Brunner et al., 2011;Vansteensel et al., 2016 ).Moreover, the step-by-step progress during the two days elucidated the potential for further improvements with longer training or intensive usage of the system.

Discussion
The tradeoff between invasiveness and signal quality is a crucial factor for a sustainable BCI that can provide long-term benefits to users.The intracranial BCI system demonstrated here minimized the invasiveness with optimization both on electrode selection and signal enhancement.The fMRI-based MT localizer was used to pinpoint precise visual motion areas for each individual before electrode implantation.Differential rereferencing and smart stopping algorithms were developed to address the limitation of signal quality.BCI spelling performance of 12 correct characters per minute was achieved, with only three SEEG elec-trodes within the vicinity of several millimeters, which demonstrated its potential as a minimally invasive BCI for sustainable use.
In our approach, the visual motion response from the MT complex was used to drive BCI typing.Importantly, the precise localization of the MT complex is crucial for the system.The MT complex is thought to be located in the ascending limb of the inferior temporal sulcus (AL-ITS) ( Dumoulin et al., 2000;Howard et al., 1996 ).However, the existence of individual differences is an obstacle in methods solely relying on anatomical localization.Previous studies have demonstrated that visual motion properties, e.g., the corresponding visual field, direction, and speed gradient, are encoded in different subregions of the MT+ complex, including the MT and MST ( Dukelow et al., 2001;Martinez-Trujillo et al., 2005 ).While MT encodes the basic elements of visual motion, the MST encodes higher-order motion-processing capabilities ( Duffy and Wurtz, 1991a;1991b ).In this study, an fMRI paradigm based on a moving bar instead of a flying dot stimulus was used to locate the MT complex.A flying dot is a kind of wide field optic flow stimulus that consists of different types of motion, which can elicit the response from the whole MT+ complex ( Dukelow et al., 2001 ), whereas the moving bar paradigm only consists of translational motions, which can elicit the response from the MT area.Compared with the classical MT+ localizer using a flying dot stimulus, the activation elicited by the moving bar stimuli was significantly more concentrated in the MT area (Fig. S4C), which means that the target MT area can be identified more precisely in future surgical planning.Moreover, because the two paradigms based on different types of stimuli (the moving bar stimulus mainly reflects translational movement, while the flying-dot stimulus mainly reflects radial movement), the activated MT functional subspaces were also different (Fig. S4B).The flying dot paradigm included both the movement of dots and a radial movement of the whole visual field, resulting in a complex movement combination.Therefore, an appropriate fMRI localizer should be designed for specific BCI paradigms to optimize and thus minimize the target area for BCI electrode implantation.In addition, one thing to note is that MT can also respond to stimuli irrelevant to the BCI task, such as tactile motion stimuli ( Gaglianese et al., 2020 ) and other perceived motions in the subject's visual field.In real-life scenarios, these irrelevant stimuli can happen incidentally.However, the paradigm is based on the timing of each visual motion stimulus.Therefore, irrelevant stimuli would be minimized during the averaging process, and thus guarantees that the BCI performance will not be significantly degraded.
In the visual motion speller, visual motion stimuli in different directions (leftward or rightward) could elicit responses with different topologies and latencies ( Chen et al., 2019;Liu et al., 2020 ).With the visual motion direction as an extra dimension to encode the visual motion speller interface, the visual motion speller's ITR could be doubled on EEG ( Liu et al., 2020 ).Here, the possibilities of extending this concept to the SEEG-based visual motion speller were investigated.The dualdirectional visual motion paradigm was evaluated offline with S2, which had an SEEG electrode at the right MT complex ( Fig. 2 A).The rightward stimuli could elicit high-gamma responses 33.5 ms faster and 1.07 dB stronger than that elicited by the leftward stimuli ( Fig. 4 A).As the rightward stimulus started at the virtual button's left border, it was first projected to the right hemisphere and elicited a substantial response by the right MT complex.For the leftward stimulus, the response did not appear until the bar crossed the midline and projected to the right MT complex.Even though the leftward response was weaker in the right hemisphere, it was still highly distinguishable from the nontarget responses and the rightward responses.This suggests that the dual-directional visual motion paradigm can be migrated to the minimally invasive BCI scenario to further boost its communication speed.Theoretically, the ITR can be doubled through dual-directional visual motion stimuli, which have been demonstrated in our recent scalp-EEG study ( Liu et al., 2020 ).
In real-world situations, SEEG is a relatively invasive approach.It has been used in long-term implantable medical devices such as responsive neurostimulation (RNS), but there have been case reports associated with its adverse effects ( Geller et al., 2017;Hartshorn and Jobst, 2018 ).A minimally invasive approach is to implant a micro epidural electrode array above the functional localized visual motion area.Micro-epidural electrodes do not penetrate the cortex but cover the surface of the dura, collecting local field potentials.The BCIs with the state-of-the-art ITR use intracortical electrode array to record single-unit neural activity ( Pandarinath et al., 2017;Willett et al., 2021 ).Therefore, the spatial resolution and SNR of the epidural electrodes are lower than that of the intracortical one.However, since epidural electrodes do not damage the dura mater, they can avoid severe immune reactions and provide better biocompatibility for long-term use.Benabid et al. proposed such an epidural BCI with a size of 5 × 5 cm and demonstrated its preliminary usage in body movement control ( Benabid et al., 2019;Romanelli et al., 2019;Sauter-Starace et al., 2019 ).The visual motion area is relatively small compared with the sensorimotor cortex, which allows the electrode array to have a smaller size and cause fewer lesions.In our case, the visual motion BCI requires only two-or three-dimensional discrete control, thus providing a good balance between invasiveness and controlling efficiency.Besides, the noninvasive EEG still has the potential to be an alternative, as most potential BCI users would first consider noninvasive BCI applications ( Collinger et al., 2013 ).Currently, EEG is limited by its low SNR and spatial resolution in BCI applications.However, EEG and iEEG are signals from the same source that have undergone different transformations ( Steyrl et al., 2016 ).Unlike iEEG that have di-rect access to local field potential (LFP), EEG collects LFP that undergoes degradations such as transformation from spiking to oscillatory activity, attenuation, and spatial smoothing.By gaining a better understanding of these processes, most of the information not yet lost can be recovered ( Steyrl et al., 2016 ), allowing EEG-based BCI to achieve comparable performance with the intracranial one.However, for information that has been lost, an algorithm cannot be of any help to recover.Thus, different approaches have their own pros and cons, which is a tradeoff between invasiveness and BCI efficiency.

Data and code availability statement
The code and data are available at https://github.com/HongLabTHU/MI-BCI .

Fig. 1 .
Fig. 1.MT localization and the decoding pipeline.(A) fMRI-based MT-Localizer.(B) Electrode selection based on BOLD activation for S2.The three adjacent electrode contacts with the most prominent BOLD responses were selected and rereferenced.(C) Schematic diagram of the SEEG electrodes used in this study.(D) Example of differential rereferencing for S2.Vertical dashed lines mark the onsets of visual motion stimuli.The gray shadow represents 100-250 ms after the onset of an attended visual motion stimulus.The noise was eliminated, and the signals were enhanced after rereferencing (right panel).(E) The visual motion speller interface.The direction and length of the arrow above the button "H " marked the direction and moving distance of the moving bar stimulus.Dimensions between targets were marked by gray arrows.(F) The t -values over the spectrogram of the target and nontarget visual motion stimuli (unpaired t test).(G) The average high-gamma (HG) envelope (upper panel) and ERP waveform (lower panel) for attended and unattended visual motion stimuli for S2.The shadow for each line denotes the standard error (S.E.).

Fig. 4 .
Fig. 4. The response pattern of the dual-directional visual motion paradigm.(A) Upper: The high-gamma envelopes of the average responses of leftward and rightward stimuli minus the nontarget baseline.The rightward response had an average latency that was 33.5 ms faster than the leftward response.The gray shadow outlines the interval where the two kinds of responses were significantly different (  < 0 .05 , unpaired t test).Lower: Negative logarithm of the p values for the difference between leftward and rightward responses.(B) Spectrogram of the leftward (upper) and rightward (lower) responses.L: leftward, R: rightward, Non: nontarget, BC: Bonferroni correction.

Table 1
Clinical profiles of the subjects.

Table 3
Offline evaluation of five subjects.
a Equivalent correct characters input per minute.aThe total number of characters to type.b The average time used for a single selection.c Equivalent correct characters input per minute.d Maximum five times of average.Other sessions for S2 were set to a maximum of three trials.