Evaluating the Feasibility of Visual Imagery for an EEG-Based Brain–Computer Interface

Visual imagery, or the mental simulation of visual information from memory, could serve as an effective control paradigm for a brain-computer interface (BCI) due to its ability to directly convey the user’s intention with many natural ways of envisioning an intended action. However, multiple initial investigations into using visual imagery as a BCI control strategies have been unable to fully evaluate the capabilities of true spontaneous visual mental imagery. One major limitation in these prior works is that the target image is typically displayed immediately preceding the imagery period. This paradigm does not capture spontaneous mental imagery as would be necessary in an actual BCI application but something more akin to short-term retention in visual working memory. Results from the present study show that short-term visual imagery following the presentation of a specific target image provides a stronger, more easily classifiable neural signature in EEG than spontaneous visual imagery from long-term memory following an auditory cue for the image. We also show that short-term visual imagery and visual perception share commonalities in the most predictive electrodes and spectral features. However, visual imagery received greater influence from frontal electrodes whereas perception was mostly confined to occipital electrodes. This suggests that visual perception is primarily driven by sensory information whereas visual imagery has greater contributions from areas associated with memory and attention. This work provides the first direct comparison of short-term and long-term visual imagery tasks and provides greater insight into the feasibility of using visual imagery as a BCI control strategy.


Evaluating the Feasibility of Visual Imagery for an EEG-Based Brain-Computer Interface
tion whereas visual imagery has greater contributions from areas associated with memory and attention.This work provides the first direct comparison of short-term and longterm visual imagery tasks and provides greater insight into the feasibility of using visual imagery as a BCI control strategy.

I. INTRODUCTION
T HE concept of using brain signals recorded via elec- troencephalography (EEG) to control external devices has gained traction in recent years as a potential way to provide patients with severe neuromuscular disorders a way to communicate and interact with the world around them [1].This technology, termed brain-computer interface (BCI), has since grown to cover applications such as robotic control [2], [3], [4], communication [5], [6], and even entertainment and gaming [7], [8], [9].Various control strategies exist for BCI interaction, but each come with their own limitations that prevent BCIs from obtaining widespread use outside the lab [10].The imagined movements of large body parts is perhaps the most common control paradigm for EEG BCI applications; however, it often suffers from lengthy training times [10], with inconsistent and unstable performance [11], and a restricted range of options for imagined movements [12].Most attempts to address the limitations of this paradigm focus on advancing EEG signal processing and classification techniques [13]; however, an often overlooked solution is investigating other imagery-based control strategies [14], [15].Perhaps the ideal approach would be to utilize the unlimited flexibility of visual imagery to provide a more ecological connection between mental imagery and the intended action.Therefore, the objective of this work is to investigate the efficacy of using visual imagery for EEG BCI control.

A. Limitations of Current BCI Control Paradigms
When designing a BCI system, one major consideration is the means of interaction used to perform the task.These control paradigms can be divided into two methodologies: exogenous paradigms based on the brain's response to an external stimulus, and endogenous paradigms where participants learn to modulate their brain activity using mental imagery [10].The most popular exogenous paradigms in EEG rely on measuring the brain's response to visual stimuli such as a flickering target [5], [16].Most participants can learn to use these paradigms with high accuracy and minimal training [10].However, these procedures can be time intensive and require a high level of sustained attention and visual focus which can cause fatigue [17] and would not be suitable for individuals with visual impairments or photosensitivity [18], [19].Furthermore, there is often large variability in performance across individuals [20], [21], perhaps because the control strategy is not intuitive, and this confusion can take focus away from the desired application.For example, the user would have to remember which flickering target corresponds to the intended action instead of attending to the action directly.
Endogenous control paradigms with mental imagery can be used instead to overcome these challenges.Motor imagery of the movements of large body parts (e.g., right vs. left hand) is the most popular imagery-based paradigm [10].Limitations for this approach include lengthy training times (weeks to months) [10], large inter-and intra-subject performance variations [11], non-intuitive control schemes for certain applications [22], and a limited variety of classes available for BCI control [12].Furthermore, factors such as noise in the EEG signals, motivation, fatigue, and difficulty visualizing the intended action can greatly impact a user's ability to gain control of the BCI [23].This leads to a challenge referred to as "BCI illiteracy" in which a substantial percentage of participants (approximately 15-30%) remain unable to achieve proper control of a BCI even after a standard training period [24].Even for the participants who can attain some control, performance often falls short of the desired threshold rate for effective control (often set at 60% or 70% accuracy in classification of the intended action) [11], [13], [25].These difficulties have been observed across all BCI paradigms regardless of the neural signal used [26].Some studies have even seen that participants who are deemed "BCI illiterate" using one paradigm can reach proficiency with another that may be more matched to their specialized expertise [27], [28], [29].For example, a recent study by Lee et al. [29] compared performance with a BCI when participants attempted to use a motor imagery, event related potential (ERP), and steady-state visual evoke potential (SSVEP) control paradigm.They found that 72.2% of the participants were deemed "BCI illiterate" on at least one of the paradigms with the imagery-based paradigm showing the highest rate at 53.7%.However, all participants were able to control at least one of the systems.This indicates that for these individuals that may have difficulty with one type of BCI, the availability of an alternative, more intuitive mental imagery paradigm such as visual imagery may be beneficial to achieve proper BCI control.

B. Visual Imagery as an Alternative BCI Control Strategy
Visual imagery, or the spontaneous mental simulation of visual information from long-term memory, could be a useful BCI control strategy that has not yet been sufficiently tested [30].Several studies have shown that various categories of images (e.g., faces, animals, and inanimate objects) can be reliably distinguished using EEG when participants are observing an image [30], [31], [32].However, very few studies have attempted to measure visual imagery using EEG, and those that do have shown mixed success [12], [30], [33].Bobrov et al. [33] provides the first investigation into the use of visual imagery as a BCI control paradigm.In this study, they were able to reliably distinguish between visual imagery of faces, visual imagery of houses, and resting state with an average of 56% classification accuracy (chance 33%).However, this study was limited by the number of recruited subjects (N=7), amount of data collected (four sessions each approximately 5 min long), and the quality of data collected (first three sessions used the 16 channel Emotiv Systems Inc.Epoc headset).Lee et al. [12] was able to demonstrate a high average classification accuracy of around 40% (chance was 7.69%; N=22 participants) during an offline analysis of a single session of 13 visual imagery categories.This included words used for patient communication with concrete properties (e.g., ambulance, clock, or toilet) or abstract properties (e.g., hello, stop, or yes).In Kosmyna et al. [30], researchers performed offline classification between two classes of flower vs. hammer during visual observation and imagery.They were unable to achieve above chance accuracy between the two classes during visual imagery (average classification accuracy 52%, chance 50%), but they were able to distinguish trials when participants performed visual imagery vs rest (77% average classification accuracy; chance 50%) and between visual observation vs imagery (71% classification accuracy; chance 50%).
One shortcoming in the aforementioned studies is that the two larger experiments by Kosmyna et al. [30] and Lee et al. [12] displayed the target category in each trial immediately before the imagery period.This could be considered more of a test of holding the object categories in working memory rather than spontaneous visual imagery [34].This leaves the question open about whether spontaneous visual imagery can be decoded from EEG.To address this question, this current study provides participants with both visual and auditory cues of the intended mental imagery in separate experimental blocks.
Furthermore, the addition of the actual image during the cue period will allow a direct comparison between the neural signals elicited during observation and imagery.A study by Xie et al. [35] followed a similar procedure while looking for similarities between the mental activity during visual observation and imagery and found a correlation in the alpha band (8)(9)(10)(11)(12)(13) between the two conditions.This is supported by the sensory recruitment hypothesis [36] which posits that the neural representations activated during perception can also be activated during short-term retention.However, the study by Lee et al. [12] found that activity in the higher gamma band (30-100 Hz) contained the most informative activity for visual imagery.This study seeks to add to this ongoing investigation for the most informative features for visual imagery decoding and the similarities between the neural activity during perception and imagery.

C. Identifying Neural Mechanisms Contributing to BCI Performance
It is poorly understood why certain individuals are unable to control a BCI after a standard training protocol [11].Previous literature from motor imagery has suggested many factors could play a role in performance variability including the user's basic demographics [37] (e.g., lifestyle, gender, or age), psychological traits [25], [38], [39] (motivation, confidence, or frustration), physiological traits [40], [41] (e.g., recruitment of motor imagery related brain networks), and anatomical structure [42] (e.g., structural integrity and myelination quality).Previous work from our group has indicated that difficulty learning to modulate desired brain activity in an fMRI neurofeedback task could be due to greater similarity in the brain activity patterns for each category [43] or overly rigid activity patterns (i.e., insufficient variability) for each category [44].For this reason, we followed a similar approach as Kaneshiro et al. [45] to quantify the representational similarity between image categories using confusion matrices generated from multi-class classifications.We hypothesize that the classification of spontaneous visual imagery from long-term memory will reveal greater neural representation similarity between the image categories compared to short-term visual imagery from working memory.This work provides the first direct comparison of short-term and long-term visual imagery tasks measured by EEG in healthy adults and provides greater insight into the feasibility of using visual imagery as a BCI control paradigm.
We also administered a Vividness of Visual Imagery Questionnaire [46] (VVIQ) before the start of the experiment along with questionnaires of perceived psychological traits such as motivation, alertness, and frustration after each session.Participants' attention and engagement throughout the experiment were also monitored using eye tracking and pupillometry.Previous literature has shown that changes in the diameter of the pupil can occur in response to psychophysical and psychological stimuli [47].Together, this information was collected to allow a more thorough exploration into the conditions contributing to successful decoding of visual imagery.

II. METHODS A. Participants
A total of N=30 healthy young adults between the ages of 18 and 40 years old were recruited from the Austin area for participation in this study.However, only N=26 subjects (18 female, average age 22 years, SD=4.17 years) were included in the analysis due to issues encountered during data collection.All methods were performed in accordance with the relevant guidelines and regulations of the University of Texas at Austin Institutional Review Board.

B. Inclusion Criteria
The experiment conducted in this study involved participants performing visual imagery of a cued stimulus presented on a computer monitor.Each participant's neural activity was measured by EEG throughout the experiment.As such, all participants were required to meet the following inclusion criteria: ability to provide informed consent, not be currently using any medication for psychiatric reasons, not be currently using any sedatives, no history of major psychotic disorders (including schizophrenia and bipolar disorder), no history of epilepsy or photosensitivity, no substance dependence, and good vision or minimal correction with contacts or eyeglasses.Participants were also asked to remove hair braids or any other tight hair styles and have clean hair (no oils, hair spray, or any other hair product) before participating in EEG recordings.This study also incorporated eye tracking, for which case individuals were excluded from participation in this study if they have glasses with more than one power (such as bifocals, trifocals, or progressive lenses), eye surgery (such as corneal, cataract, or intraocular implants), or eye movement or alignment abnormalities (such as amblyopia, strabismus, or nystagmus).

C. Stimuli
Images for this experiment were obtained from an in-lab dataset of famous faces, animals, objects, and scenes that were chosen to be easily recognizable by the subject population (Fig. 1).The image categories were selected to be consistent with prior literature on representational similarity analysis and for their potential to provide distinct patterns of brain activity [45], [48].Participants were instructed to select one image per category to use throughout the experiment that they were familiar with and could easily remember.All images were presented at a similar size (viewing angle ≈ 3 • ) with a neutral gray background.

D. Task Protocol
This study involved a single session of data collection.During the experiment, the participants were asked to perform mental visual imagery of four different categories of human faces, animals, objects, and scene images.The task included 5 blocks of visual imagery following either a visual observation or an auditory cue (Fig. 2a and 2b, respectively).During the observation cue blocks, the target image for each category was displayed with a small fixation cross at the center of the screen for 2.5 sec and the participant was instructed to passively view it while attempting to keep their gaze on the center cross.The image was then removed from the screen for 3 sec.and the participant was instructed to picture the image that they just saw in vivid detail while keeping their eyes open and fixated on the center cross.The auditory cues followed a similar procedure, except that a verbal cue of "face", "scene", "animal", or "object" was played over a speaker and the participant was instructed to recall the associated image and picture it as vividly as possible in their mind.The timings for image presentation and visual imagery were chosen based on prior literature of decoding visual perception and imagery via EEG [12], [35] and to facilitate the cross-task analysis described below in section H.
After each imagery period in both conditions, an image was flashed on the screen for 200 ms.The displayed images were randomly presented with a 70% chance of being the target image and a 30% chance of being a non-target image.The participant was then instructed to quickly respond with a left or right button press if the presented image was the target category or a non-target category for that trial, respectively.If the response was correct, the fixation cross turned green.If the response was incorrect, the fixation cross would turn red.If the participant responded too slowly or did not provide a response for the trial, the fixation cross turned yellow.The primary purpose of this memory test was to ensure the participant was actively engaged throughout the experiment and to prevent mind wandering.This data could be used to remove trials where the participants may have missed the cue for that trial.Furthermore, this procedure was designed to mimic a typical BCI scenario where feedback of the predicted target is displayed at the end of each trial.The timing of the memory test was chosen based on feedback from initial pilot testing which found 200 ms made the task sufficiently challenging.
The full experiment included 5 blocks of the observation cues and 5 blocks of the auditory cues.Each of the observation cue blocks consisted of 40 trials and lasted approximately 6 minutes.The auditory cue blocks had a shorter cue duration, so 48 trials were included for each block and also lasted approximately 6 minutes.After each block, the participants were required to take a minimum of 1 min break before continuing with the experiment, but they were allowed to take additional time if needed.A 10 sec resting period preceded each block in which the participants were asked to fixate their gaze on the center cross and keep their mind blank.The entire session lasted approximately 1 hour and 15 minutes.

E. Data Collection and Preprocessing
EEG data was collected from 32 channels in accordance with the 10-20 standard via the Brain Products actiCAP system with a sampling rate of 500 Hz.Eye movements and blinks were captured by placing four bipolar electrodes around the eyes with a reference placed on the mastoid.Data processing was carried out using the MNE Python package.Data from channels labeled as "bad" were removed and interpolated from the surrounding electrodes.Then, the signal was re-referenced to the common average to remove any background noise that is common across all channels.Eye movement and blink artifacts were removed from the signal using an independent component analysis (ICA) [49].Artifact removal via ICA was carried out using the following procedure: implementing a high-pass filter at 1 Hz to remove signal drift, implementing notch filters at 60 Hz and its harmonics to remove powerline interference, annotating the task and break periods, running the ICA using MNE's ICA algorithm with 32 components, manually selecting components that contained artifacts, removing these components, and applying the solution to the unfiltered data.

F. Visual Perception Classification
Various feature extraction methods and classifiers were evaluated for the prediction of the four visual perception categories.We tested features in the delta (1-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), beta (13-30 Hz), and gamma (30-100 Hz) extracted via Morlet Wavelets, Fast Fourier Transform (FFT), and Common Spatial Patterns (CSP).The gamma band was also divided into low (30-60 Hz) and high (60-100 Hz) gamma for evaluation.For features extracted from the delta, theta, alpha, and beta bands, the data was first bandpass filtered between 1-40 Hz to remove low frequency signal drift and high frequency noise.For features including the gamma band, the data was bandpass filtered between 1-100 Hz.A notch filter at 60 Hz was also implemented to remove powerline interference.Only data from the 8 posterior EEG channels (O1, O2, Oz, P3, P4, P7, P8, and Pz) were used for feature selection.
Morlet Wavelets were employed in each desired frequency range equally spaced every 1 Hz for delta and theta bands, every 2 Hz for alpha and beta bands, and every 5 Hz for gamma bands.The mean power in each frequency bin was used as features for classification.For features extracted via the FFT, the mean power in each band was used for classification.CSP features were extracted using the CSP function from the MNE Python package with 8 components in the desired frequency range.After extraction, the features were then normalized using the MinMaxScaler from the scikit-learn Python package to scale the features between 0-1.
The classifiers tested in this analysis include Logistic Regression (LR) with a newton-cg solver, Linear Discriminant Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
analysis (LDA) with a shrinkage term of 0.1, and support vector machine (SVM) with a linear kernel.These classifiers are known to be robust in EEG classification including the decoding of visual imagery [12].All classification approaches were cross-validated using a leave-one-block-out (5-fold) crossvalidation approach.

G. Visual Imagery Classification
A similar method as described above was also applied to the two visual imagery conditions to identify the optimal features and classifiers for the prediction of the four imagery categories.Due to the inherently low signal-to-noise ratio involved with visual imagery, an additional preprocessing stage of removing trials with signal amplitude exceeding 100 mV was implemented.This stage effectively removes trials where the participant may have been moving.This stage removed less than 5% of trials for each subject.In addition to the 8 EEG channels used during the visual perception classification, some frontal channels were found to carry information relevant to imagery classification.A grid search analysis of the best channels for imagery led us to include channels O1, O2, Oz, P3, P4, P7, P8, Pz, TP10, F7, F8, and FC6.

H. Cross-Task Classification
Due to the similarities between the experimental conditions, we were interested in seeing if the inclusion of data from another condition can improve the classification accuracy.For example, can the inclusion of the perception periods improve the classification of the visual imagery periods?To test this approach, we concatenated the data from two conditions and performed a leave-one-block-out (10-fold) cross validation.For this analysis, we used the mean power in the 1-15 Hz band obtained by Morlet Wavelets over the 8 posterior channels used during the perception classification.

I. Evaluation of Performance
The classifier's performance is evaluated based on the number of trials where the EEG classification output matches the target category for that trial beyond the level of chance.However, small sample sizes can lead to false positives, and Combrisson and Jerbi [50] have suggested to address this issue by adjusting the chance level as a function of sample size (n) and number of classes (c) using a binomial cumulative distribution.Using this method, the probability of a classification model to predict the correct label at least z times by chance is given by (1).
In this study, the statistically significant threshold level was calculated using the MATLAB (Mathworks Inc., MA, USA) function St (α) = binoinv(1 − α, n, 1/c) × 100/n, where α is the significance level given by α = z/n.For this study, each session included a total sample size of n = 240 observations for the observation period and visual imagery period following the observation cue.There were n = 288 observations for the visual imagery period following the auditory cue.The experiment consisted of c = 4 classes which provided a significance threshold of 29.58% at p = 0.05 for the observation and visual imagery periods following the observation cue.The imagery periods following the auditory cue had significance threshold of 29.17%.In other words, the classification model must achieve a classification accuracy above this threshold to be considered statistically significant.

J. Pupillometry Data Analysis
Eye tracking data was captured throughout the experiment using a Tobii Pro Nano device.This data consisted of the x and y gaze positions along with the pupil diameters for each eye recorded at a sampling rate of 60 Hz.The pupillometry data was preprocessed using the methods outlined in Combrisson and Jerbi [50] and Winn et al. [51].First, the data was segmented into trials starting from the onset of the cue to one second after the start of the memory test.Then, eye blink artifacts were corrected by identifying segments with nan values, removing 5 datapoints from the beginning and end of the nan segments, and interpolating the values from the surrounding data points.Trials where over 30% of the data was nan values were labeled as bad and were removed from the analysis.The data was then filtered with a second order Butterworth bandpass filter between 1 and 10 Hz to remove low frequency drift and high frequency noise and standardized using z-score.The trials were then baseline corrected by subtracting out the mean pupil dilating from the 1 sec long inter-trial period before the onset of each trial.Due to issues encountered during data collection, only N=22 and N=15 participants were included in the analysis of the eye tracking data for the observation and auditory blocks, respectively.

III. RESULTS
Our first test was to observe the classification accuracy between the four imagery categories during the observation period.Table I presents the classification accuracies when using the Logistic Regression (LR), Linear Discriminant Analysis (LDA), and Support Vector Machine (SVM) classifiers with various band power features extracted via Morlet Wavelets, Fast Fourier Transform (FFT), or Common Spatial Patterns (CSP).The right column presents the best combination of features combining the data in the Delta, Theta, and Alpha bands of brain activity.The highest accuracy was obtained by employing an LDA classifier trained on the mean Morlet wavelets in the 1-15 Hz range equally spaced every 2 Hz from the eight posterior EEG channels.This yielded a mean classification accuracy of 42.11% across all 26 subjects (Fig. 3a).This was significantly higher than the significance threshold of 29.58% ( p < 0.01 × 10 −7 ).To localize the most predictive channels for the prediction of the observation trials, we also performed a searchlight analysis where only the data of a single EEG channel was used to perform the classification (Fig. 3b).As expected, this analysis revealed that the posterior electrodes directly over the primary visual cortex obtained the highest classification accuracy.
We next attempted to classify between the four visual imagery classes from short-term working memory following  the observation cue.Similar preprocessing, feature extraction, and classification methods as for the observation periods were tested (Table II).We found that the LR classifier trained on the mean Morlet wavelets in the 1-15 Hz provided the best classification accuracy with a mean of 30.05% across all subjects which was significantly higher than the significance threshold of 29.58% ( p < 0.05) (Fig. 4a).Fig. 4b presents the average channel-wise heatmap across all subjects, which shows a trend similar to the observation period where posterior channels carry the most relevant information for classification.However, individual analysis of the data shows that there may be some greater contributions from some of the more frontal channels.Due to the low accuracy in decoding the visual imagery categories, no individual channels were found to exceed the significance threshold during the exploratory channel-wise searchlight analysis.Also included in Fig. 4c is the channel-wise heatmap of a subject with high classification  accuracy to highlight the channels relevant for short-term visual imagery.Finally, we tested the classification accuracy of the four visual imagery classes from long-term memory following the auditory cue (Table III).The best classification accuracy of 26.74% across all subjects was obtained by using 25 mean Morlet Wavelets logarithmically spaced in the 1-100 Hz band (Fig. 5a).However, this combination still did not pass the significance threshold of 29.17% for this section.Similar to the results from the short-term visual imagery analysis, no significant channels were revealed during the exploratory channel-wise searchlight analysis of the long-term imagery categories (Fig. 5b).However, individual analysis of subjects with higher accuracy also reveals significant contributions from frontal channels during visual imagery (Fig. 5c).Fig. 6 presents the classification accuracies for each subject across the tasks for comparison.
To improve the classification accuracy of the imagery periods, we were interested to see if the addition of data from the observation periods would improve the prediction ability.Contrary to our expectations, classification performance for all trial periods suffered when training included data from other periods.When the data from the observation periods were combined with the short-term visual imagery periods, the observation classification accuracy dropped to 38.47% and the imagery classification accuracy dropped to 27.30%.The observation periods and long-term imagery periods following the auditory cue concatenated together dropped to 36.50% and 26.42%, respectively.The classification accuracy of the imagery periods following the observation cue and auditory cue concatenated together dropped to 27.90% and 25.70%, respectively.
The time course of the Morlet Wavelet features for each of the tasks is presented in Fig. 7.All tasks exhibited primary Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.activity within the alpha band of brain activity; however, the timing for the appearance of the activity differed between each task.Observation of the image produced activity in the alpha band appearing almost immediately and dropping about 400 ms following the presentation of the image (Fig. 7a).The two imagery tasks produced a more sustained activity throughout the trial, which appeared approximately 500 ms after the start of the imagery period.However, the short-term visual imagery task peaks around 1500 ms after the start of the imagery period whereas the long-term visual imagery task peaks around 500 ms after the start and begins to decrease.
We also analyzed the pupillometry data to see if this could be used to identify when a participant was engaged in the visual imagery task.Fig. 8 presents the mean pupil dilation across participants during the observation cue blocks and the auditory cue blocks, respectively.During the observation cue blocks, the pupil contracts with the onset of the stimulus presentation.When the images are removed and the participant is instructed to perform visual imagery, the pupil dilates back to baseline.The pupil contracts again after the image is flashed during the memory periods.During the auditory cue blocks, the pupil begins to dilate immediately following the auditory cue and peaks after approximately 1.5 sec before returning to baseline.The pupil contracts during the memory periods after the image is flashed.This dilation of the pupil during the imagery periods provides a good indication that participants were actively engaged in the task.

IV. DISCUSSION
This study demonstrates that decoding visual imagery from EEG is a challenging task.From the early work of Bobrov et al. [33], Kosmyna et al. [30], Lee et al. [12], [52], and Xie et al. [35], one of the major points of contention between the experimental procedures was the presentation of the target image during the task directly before the imagery period.It could be argued that this is not a true test of spontaneous visual imagery from long-term memory but rather holding the object in short-term working memory.Our study presents the first direct comparison between the ability to decode visual imagery following observation of the target  image and following an auditory cue for the target image.In accordance with our hypothesis, our classifier was able to achieve greater accuracy in predicting between the four image categories during the short-term visual imagery task following the observation cue compared to the long-term imagery task following the auditory cue.Also as expected, visual imagery produced a more nuanced pattern of activity that is more difficult to untangle using multivariate decoding of EEG data compared to actual visual observation of the images.
In an early study by Lee et al. [53], researchers examined the differences in brain areas activated during visual perception and imagery.The results of this study showed a considerable overlap in activity between the two conditions in many areas of the brain; however, this overlap was neither uniform nor complete.They saw nearly complete overlap in frontal and parietal regions involved in various types of cognitive control processes such as the retrieval of episodic information, performing visual inspection, generation of visual images, attention, spatial working memory, and visuospatial processing.On the other hand, the activations in the occipital cortex were stronger and more diffuse during perception than during imagery.This exemplifies that the occipital regions are more strongly driven by sensory information rather than the information stored in memory.These regions are responsible for facilitating object detection and identification that are not necessarily required for the visualization of mental images [53].Similar results were shown in this current study.The channel-wise searchlight results showed a large overlap in areas that were most informative for prediction during visual observation and imagery.The observation task was mostly driven by the occipital electrodes which were receiving the sensory information, but some of the frontal electrodes also seemed to carry relevant information.The imagery task showed much lower classification accuracies in the occipital channels and a greater influence of the frontal electrodes.
We also saw that spectral power below 15 Hz, and specifically alpha band power (8-13 Hz), provided the most Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
informative feature for classification during the observation periods and the short-term imagery periods.While this is consistent with the results found by Xie at al. [35], the results from Lee et al. [12] and Kosmyna et al. [30] indicates that higher gamma activity (30-100 Hz) may also carry information relevant for visual imagery.In our analysis, we also found that inclusion of the higher gamma range of brain activity may be beneficial for classification of the long-term visual imagery task.Unfortunately, the results were not significant, so a definitive conclusion cannot be made.Additionally, even though we found similar features and channels between the conditions, we were unable to utilize the data from the other conditions to improve the classification accuracy.It is possible that even though there is a significant overlap between the activity associated with the different experimental conditions, there is still too much variance or too little training data for the classifier to make use of the additional information.For example, the observation trials were mainly driven by activity in the occipital cortex related to the sensory perception of the presented image while the imagery trials received more contributions from the frontal areas associated with memory.Furthermore, our analyses revealed that the Morlet Wavelet features used in these analyses exhibited differences in the timing of activation across each of the tasks.It might also be that the two imagery conditions are confounded by a difference in the memory conditions used.
The imagery periods following the observation cue are more of a short-term working memory task while the imagery periods following the auditory cue are more of a long-term memory retrieval task.In a study by Ganis et al [54], researchers have shown a differential effect in the visual responses generated during visual imagery of famous faces generated from short-term memory and long-term memory.They found that both tasks activated similar areas of the brain, but the activity was greater during short-term memory when the subjects were asked to memorize specific pictures of celebrities.During the long-term task where they were asked to imagine the famous person without the presentation of an image, the neural activations were lower across all relevant brain regions.However, focusing on specific features of the imagined faces such as the eyes, lips or nose was shown to increase activation regardless of memory type.These results suggest that the type of cue and the instructions given to the participants can play a large role in the neural processes used and the activity evoked during mental visual imagery.
In the current experiment, retrieving a mental image from long-term memory following an auditory cue rather than retaining a visually presented image in working memory produced weaker or more variable neural activity that was more difficult to decode using common machine learning techniques.For mental imagery, the participants in this study were instructed to recreate the image in their mind and visualize the details as clearly as possible.However, in the post-experiment survey responses, multiple participants reported strategies of repeating the objects' name in their mind or thinking about the colors of the images, which may not be an optimal imagery strategy and introduces unsystematic variability in the neural signals during the imagery periods.Future work may benefit from providing more detailed instructions for how to visualize the target category or providing feedback on the classifier's prediction in real-time to allow the participants to adapt their strategy throughout the experiment.
The results of our pupillometry analysis demonstrated an increase in pupil dilation during mental imagery in accordance with previous literature [47].In the observation cue blocks, a decrease in the diameter when the stimulus was presented and an increase in diameter when the stimulus was removed is shown as expected.However, it is difficult to distinguish whether this dilation was due to effortful mental imagery or if the pupil was just returning to baseline after stimulus presentation.One of the limitations of this study with regards to pupillometry is that the trial periods were short (8.75 secs for the observation cue trial and 7.05 secs for the auditory cue trials) which may not be ideal for the analysis of pupillometry data [50], [51].A more appropriate eye tracking experiment would provide longer resting periods between each task element to allow the pupil to fully return to baseline.Furthermore, our inter-trial period of 1 sec that was used for baseline correction may not be an appropriate time to allow the pupil diameter to fully return to baseline between trials.The large variability during this period may be due to eye blinks or movements which could interfere with the ability to interpret the data from the remainder of the trial.

V. CONCLUSION
Visual imagery presents the possibility for an intuitive paradigm for BCI applications that can directly convey the user's intentions with many natural ways of envisioning an intended action.However, the work presented in this study reveals that true, spontaneous visual imagery from long-term memory is difficult to decode from EEG.This method of spontaneous visual imagery produces a more variable neural signal compared to short-term retention of a visual image in working memory.One potential limitation of this study is that participants were not given explicit instructions for how to perform the imagery and no feedback for successful imagery was provided throughout the experiment.Future work in decoding visual imagery from EEG may benefit from providing more explicit visualization instructions as well as multiple sessions with real-time feedback of visualization ability.This will allow the users to hone their strategies over time and provide more data to adapt to more advanced classification techniques.

Fig. 1 .
Fig. 1.Sample images used during experimental procedure.Participants selected one familiar image for each category of animals, famous faces, objects, or recognizable scenery and landmarks.

Fig. 3 .
Fig. 3. Classification of observation periods.(a) Mean confusion matrix obtained from classification of neural data during visual observation of the four image categories.(b) Heatmap of accuracy obtained during channel-wise classification of observation periods across all subjects.Black arrows on color bar demarcate significance threshold at 29.58% (p<0.05).

Fig. 4 .
Fig. 4. Classification of short-term imagery periods following observation cue.(a) Mean confusion matrix obtained from classification of neural data during visual imagery of the four image categories following the observation cue demonstrates significantly above chance classification accuracy of 30.05%.(b) Heatmap of accuracy obtained during channel-wise classification of imagery periods following the observation cue across all subjects.(c) Heatmap of accuracy obtained during channel-wise classification of imagery periods following the observation cue for a subject with high classification accuracy.Black arrows on color bar demarcate significance threshold at 29.58% (p<0.05).

Fig. 5 .
Fig. 5. Classification of long-term imagery periods following auditory cue.(a) Mean confusion matrix obtained from classification of neural data during visual imagery of the four image categories following the auditory cue provides an average classification accuracy of 26.74%.(b) Heatmap of accuracy obtained during channel-wise classification of imagery periods following the auditory cue across all subjects.(c) Heatmap of accuracy obtained during channel-wise classification of imagery periods following the auditory cue for a subject with high classification accuracy.Black arrows on color bar demarcate significance threshold at 29.17% (p<0.05).

Fig. 7 .
Fig. 7. Mean frequency vs time plots of Morlet Wavelet features across the 8 posterior EEG channels for (a) visual observation, (b) short-term visual imagery, and (c) long-term visual imagery periods.

Fig. 8 .
Fig. 8. Pupil diameter changes averaged across all trials.(a) Mean pupil diameter during the observation cue blocks across all participants (b) Mean pupil diameter during the auditory cue blocks across all participants.The gray shaded region indicates the period used for baseline correction.The blue shaded region around the pupil diameter trace demarcates the 95% confidence interval.

TABLE I CLASSIFICATION
ACCURACY (%) OF VARIOUS CLASSIFIERS AND FEATURES DURING VISUAL OBSERVATION

TABLE II CLASSIFICATION
ACCURACY (%) OF VARIOUS CLASSIFIERS AND FEATURES DURING SHORT-TERM VISUAL IMAGERY

TABLE III CLASSIFICATION
ACCURACY (%) OF VARIOUS CLASSIFIERS AND FEATURES DURING LONG-TERM VISUAL IMAGERY