A review of rapid serial visual presentation-based brain-computer interfaces

12 Rapid serial visual presentation (RSVP) combined with the detection of event related brain responses 13 facilitates the selection of relevant information contained in a stream of images presented rapidly to a human. 14 Event related potentials (ERPs) measured non-invasively with electroencephalography (EEG) can be 15 associated with infrequent targets amongst a stream of images. Human-machine symbiosis may be augmented 16 by enabling human interaction with a computer, without overt movement, and/or enable optimization of 17 image/information sorting processes involving humans. Features of the human visual system impact on the 18 success of the RSVP paradigm, but pre-attentive processing supports the identification of target information 19 post presentation of the information by assessing the co-occurrence or time-locked EEG potentials. This paper 20 presents a comprehensive review and evaluation of the limited but significant literature on research in RSVP-21 based brain-computer interfaces (BCIs). Applications that use RSVP-based BCIs are categorized based on 22 display mode and protocol design, whilst a range of factors influencing ERP evocation and detection are 23 analyzed. Guidelines for using the RSVP-based BCI paradigms are recommended, with a view to further 24 standardizing methods and enhancing the inter-relatability of experimental design to support future research 25 and the use of RSVP-based BCIs in practice.

1. Introduction 33 Rapid Serial Visual Presentation (RSVP) is the process of sequentially displaying images at the 34 same spatial location at high presentation rates with multiple images per second e.g., with a stimulus 35 onset asynchrony no greater than 500ms but often lower than 100ms i.e., >10 stimuli presented per 36 second.Brain-computer interfaces (BCI) are communication and control systems that enable a user 37 to execute a task via the electrical activity of the user's brain alone (Vidal, 1973).RSVP-based BCIs 38 are a specific type of BCI that is used to detect target stimuli, e.g.letters or images, presented 39 sequentially in a stream, by detecting brain responses to such targets.RSVP-based BCIs are 40 considered as a viable approach to enhance human-machine symbiosis and offers potential for 41 human enhancement.42

Page 1 of 41 AUTHOR SUBMITTED MANUSCRIPT -JNE-102013
A c c e p t e d M a n u s c r i p t To date, the literature on RSVP-BCIs has not been comprehensively evaluated therefore it is timely 1 to review the literature and provide guidelines for others considering research in this area.In this 2 review we; 1) identify and contextualize key parameters of different RSVP-BCI applications to aid 3 research development; 2) document the growth of RSVP-based BCI research; 3) provide an 4 overview of key current advancements and challenges; 4) provide design recommendations for 5 researchers interested in further developing the RSVP-BCI paradigm.6 7 This paper is organized as follows; Section 2", presents background information on the fundamental 8 operating protocol of RSVP-BCIs.Section 3 details results of a bibliometric analysis of key terms 9 "Rapid serial visual presentation", "RSVP", "Electroencephalography", "EEG", "Brain-Computer 10 Interface", "BCI", "Event Related Potentials", "ERP and "Oddball" found within authoritative 11 bibliographic resources.Section 4 provides an overview of performance measures.Section 5 12 outlines existing RSVP-based BCI applications, presenting inter-application study comparisons and 13 undertakes an analysis of the design parameters with inter-application study comparisons.Section 6 14 provides a summary, discussion of findings and ongoing challenges.15

Background 16
RSVP-based BCIs have been used to detect and recognize objects, scenes, people, pieces of relevant 17 information and events in static images and videos.Many applications would benefit from an 18 optimization of this paradigm, for instance counter intelligence, policing and health care, where 19 large numbers of images/information are reviewed by professionals on a daily basis.Computers are 20 unable to analyze and understand imagery as successfully as humans and manual analysis tools are 21 slow (Mathan et al., 2008;Gerson, Parra and Sajda, 2005).In studies carried out by Sajda et  during the P300 experiment (commonly referred to as the 'Oddball' paradigm), participants must 8 classify a series of stimuli which fall into one of two classes: targets and non-targets.Targets appear 9 more infrequently than non-targets (typically ~5-10% of total stimuli in the RSVP paradigm) and 10 should be recognizably different.It is known that P300 responses can be suppressed in an RSVP 11 task if the time between two targets is <0.5 seconds; which is known as attentional blink 12 (Raymond, Shapiro and Arnell, 1992;Kranczioch, Debener and Engel, 2003).The amplitude and 13 the latency of the P300 are influenced by the target discriminability and the target-to-target interval 14 in the sequence.The latency of the P300 is affected by stimulus complexity (McCarthy and 15 Donchin, 1981;Luck, Woodman and Vogel, 2000).The P300 amplitude can vary as a result of 16 multiple factors (Johnson, 1986), such as: 17 18 • Subjective Probability -the expectedness of an event.19 • Stimulus Meaning -comprised of: task complexity, stimulus complexity and stimulus 20 value.21 • Information Transmission -the amount of stimulus information a participant registers in 22 relation to the information contained within a stimulus.23 24 2.2.RSVP-based BCI amongst the BCI Classes 25 26 BCIs can be of three different types: active, reactive or passive (Zander et al., 2010).An active BCI 27 is purposefully controlled by the user through intentional modulation of neural activity, often 28 independent of external events.Contrastingly, reactive BCIs generate outputs from neural activity 29 evoked in response to external events, enabling indirect control by the user.Passive BCI makes use 30 of implicit information and generate outputs from neural activity without purposeful control by the 31 user.Active/reactive BCIs are commonly aimed at users with restricted movement abilities who 32 intentionally try to control brain activity, whereas implicit or passive BCIs are more commonly 33 targeted towards applications that are also of interest to able-bodied users (Zander and  A c c e p t e d M a n u s c r i p t

Static 1
In 'static mode', images displayed have identical entry and exit points; -the images are transiently 2 presented on screen (typically for 100-500 ms) and then disappear.One benefit of static mode is that 3 images occupy the majority of the display and therefore, identification of targets is likely even if 4 they are only presented briefly.There are a number of different possible instructions a participant 5 may be given: 6 • Prior to presentation, a target image may be shown to participants and participants are asked to 7 identify this image in a sequence of proceeding images.Target recognition success rates can be 8 achieved with presentation rates as high as 10/second (Cecotti, Eckstein and Giesbrecht, 2012).9 • Participants may be asked to identify a type of target e.g., an animal within a collection of 10 images.In this mode, the rate of presentation should be slowed down ( RSVP is a paradigm used to study the attentional blink, which is a phenomena that occurs when a 33 participant's attention is grabbed by an initial target image and a further target image may not be 34 detectable for up to 500 ms after the first (Raymond, Shapiro and Arnell, 1992).Depending upon 35 the duration of stimuli presentation the ration of target images/total images will change (e.g. if 36 images are being presented at a duration of 100ms then there must be a minimum of 5 images 37 between targets 1 and 2. In a sequence of 100 images there can be a maximum of 20 target images.38 Whereas if images are presented at 200ms this limits the maximum number of targets to 10/100 39 images in total).40 Change blindness occurs when a participant is viewing two images that vary in a non-trivial fashion, 41 and has to identify the image differences.Change blindness can occur when confronted by images, 42 motion pictures, and real world interactions.Humans have the capacity to get the gist of a scene 43 quickly but are unable to identify particular within-scene features (Simons and Levin, 1997; Oliva, 44

Page 4 of 41 AUTHOR SUBMITTED MANUSCRIPT -JNE-102013
A c c e p t e d M a n u s c r i p t 2005).For example, when two images are presented for 100 ms each and participants are required to 1 identify a non-trivial variation as the images are interchangeably presented, participants can take 2 between 10-20s to identify the variation.This latency period in identifying non-trivial variations in 3 imagery can be augmented through use of distractors or motion pictures (Rensink, 2000).In the 4 context of designing an RSVP paradigm change blindness is of interest, as it will take longer for a 5 user to identify a target within an image if it does not pop out from the rest of the image.Distractors 6 within the image or cluttered images, will increase the time it takes a user to recognize a target, 7 reducing the performance of the RSVP paradigm.8 Saccadic blindness is a form of change blindness described by Chahine and Krekelberg (2009) 9 where "humans move their eyes about three times each second.The relevant RSVP-based BCI papers are presented in Table 1 when a button press was required, 5 and Table 2 when no button presses were conducted.RSVP based BCIs were evaluated in terms of 6 the interface design.Table 1 and Table 2 show that there is considerable variation across the 7 different studies in terms of the RSVP-BCI acquisition paradigm, including the total number of 8 stimuli employed, percentage of target stimuli, size of on-screen stimuli, visual angle, stimulus 9 presentation duration, and the number of study participants.Performance was measured using a 10 number of metrics: the area under the Receiver Operating Characteristic (ROC) curve (Fawcett, 11 2006), classification accuracy (%) and information transfer rate.ROC curves are used when 12 applications have an unbalanced class distribution, which is typically the case with RSVP-BCI, 13 where the number of target stimulus is much smaller than that of non-target stimuli.Many studies 14 report different experimental parameters and some aspects of the studies have not been 15 comprehensively reported.From Tables 1 and 2, it can be seen that the majority of applications 16 using a button press as a baseline may be classified as surveillance applications while applications 17 that do not use a button press are more varied.This may be because often surveillance applications 18 have an industry focus, and quantified improvement relative to manual labelling alone is crucial for 19 acceptance.In the majority of the applications where a button press was used, participants undertake 20 trials with and without a button press and the difference in latency of response between the two is 21 calculated to compare neural and behavioral response times.The results of the bibliometric analysis 22 are further discussed in section 4, 5 and 6, following the analysis of key papers identified in the 23 following section.sections that follow, we provide additional information about the values reported in Tables 1 and 2. 12 The intention being to validate why these performance metrics were selected when a number of 13 different results are reported by the specified study, and to highlight inter-study idiosyncrasies that 14 may need to be considered whilst comparing findings.In the next section, the different design 15 parameters for the studies identified in Tables 1 and 2  When designing an RSVP paradigm, there are eight criteria that we recommend be taken into 28 consideration: 29 1) The type of target images and how rapidly these can be detected e.g., picture, number of 30 words.31 2) The differences between target and non-target images and how these influence the 32 discrimination in RSVP paradigm 33 3) The display mode -static or moving stimuli and the background the images are presented 34 on e.g., single color white, mixed, textured.35 4) The response mode -consideration should be given as to whether a button press is used or 36 not to confirm if person has identified a target.37 5) The number of stimuli /the percentage of target stimuli -how many are presented 38 throughout the duration of a session and the effect this could have on the ERP.39 6) The rate at which stimuli are presented on screen throughout the duration of a session and 40 the effect this has on the ERP.8) The signal processing pipeline -determine the features, channels, filters, and classifiers to 1 use. 2

Display and response modes 3
A button press may be used in conjunction with either of the aforementioned presentation modes 4 (section 2.2), and entails users having to click a button when they see a target.This mode is used as 5 a baseline to estimate the behavioral performance and the difficulty of the task.In most research 6 studies, participants undergo an experimental trial without a button press and a follow-on trial with 7 a button press.RSVP paradigms to surpass button press performance and evidence suggests that the complement of 3 both modalities at comfortable lower presentation rates may indeed be the best approach.4 Nevertheless, ideally studies would contain an EEG only block and EEG plus button press block, 5 where the button press follows the target and not the image burst.This would facilitate more 6 accurate evaluation of differences and correlates between behavioural and neural response times.7 Interesting, ( described.An RSVP paradigm was undertaken whereby the participant must register a target white 5 letter in a stream of black letters and a second target 'X' amongst this stream.It was found that if 6 the 'X' appeared within ~100-500ms of the initial target, errors in indicating whether the 'X' was 7 present or not were likely to be made even when the first target was correctly identified (Raymond,8 Shapiro and Arnell, 1992).This is not to say that humans cannot correctly process information 9 presented at >10Hz.Whilst at presentation rates of 83ms mean accuracy rate was ~70% and the there was no significant 20 effect of colour.This formulation is based on the chance rate of 3.33% (i.e. 1 in 30).This implies 21 that coloured letters enhances performance accuracy but not past a certain speed of stimulus 22 presentation.23 24 There is likely a significant interaction between the difficulty of target identification and 25 presentation rate.For example, the optimal presentation rate for a given stimulus set is highly 26 dependent on the difficulty of identifying targets within that set (Ward, Duncan and Shapiro, 1997)

Image size/visual angle 2
Another RSVP design aspect to be considered is stimulus size.There is a large variation in image 3 sizes ranging from 256×256 pixels in a categorization application to 960×600 pixels in a 4 surveillance applications.In general, surveillance applications use larger images than the other 5 applications described.The most common image size used is 500×500 pixels.This is only used in 6 static surveillance applications and all surveillance studies using this image size achieved a high 7 accuracy (>80%).The other applications used smaller image sizes such as 360×360 pixels and 8 achieved high accuracies (i.e., 91% and 89.7%).Therefore, it can be concluded that for surveillance 9 studies, image sizes should be at least 500×500 pixels, although for all other applications the image 10 size may be smaller.A more complex task, where a target stimulus is presented in the background 11 of a larger image eliciting the N2 ERP.Early components such as the P1 and N2 are sensitive to the 12 spatial location of the stimuli (Saavedra and Bougrain, 2012).13 14 One issue with reporting only image size is that it is always relevant to the distance viewed from 15 screen and it location on the screen with respect to the viewer i.e., the visual angle.The visual angle 16 is the angle an image subtends at the eye, reported in degrees of arc.In a study by (Dias and Parra, 17 2011) it was shown that participants performed best (90%) when the target stimulus was centered.18 Performance consistently decreased to 50% in all participants as target stimulus were placed further 19 away from the center (4º of visual angle), this dropped further when target stimulus was placed at 8º 20 of visual angle.Although performance drops significantly participants are still able to detect target 21 stimulus shown in their peripheral visual field even at such rapid paces.Many papers report that the 22 visual angle of the stimuli can have an effect on performance.As a general principle, targets must 23 appear larger or be more distinct for detection at the outer edge of the visual field.The visual angle 24 can thus be deemed the most important measure as it accounts for distance from screen, image 25 location on screen and image size.Authors are therefore encouraged to report visual angle, as 26 reporting image size alone is not useful without the availability of distance from screen.For RSVP-27 speller studies, none of the papers found reported on the size of the image or font, however some 28 reported the visual angle.29 30

Target vs non-target Stimuli 31
Many different types of target images have been identified within this review.The majority of 32 research focuses on a two-class problem i.e., detecting target images in sequences of non-target 33 images that are completely different from each other.However, in real-life situations, non-target 34 images are likely to share some of the same characteristics as target images (A.R. Marathe et al., 35 2015).These presentation sequences appear to be more like moving images than static images.In 36 (A.R. Marathe et al., 2015) a more complex surveillance task was carried out where, in the first 37 task, participants were required to detect targets when targets are the only infrequent image whilst, 38 in the second task, targets were presented with non-targets (i.e. the target image could be found in 39 the background of a larger image).Participants were required to ignore everything else in the image, 40 a much more difficult task, and consequently the amplitude of the P300 was reduced.The results of 41

Page 24 of 41 AUTHOR SUBMITTED MANUSCRIPT -JNE-102013
A c c e p t e d M a n u s c r i p t this study found that the introduction of the infrequent non-target stimuli in the scene yielded a 1 substantial slowing of the reaction time.Surveillance applications commonly use stimuli that are 2 more complex where trained participants, such as intelligence analysts outperform novice 3 participants, as they are able to give meaning to the stimuli.The RSVP-speller applications present 4 their letters as images one at a time on screen (Hild et al., 2011).Due to the nature of the RSVP 5 paradigm, it is important that these letters are shown in a random order as participants pre-empting a 6 target can have an effect on ERP responses (Oken et al., 2014).Data categorization applications had 7 the most variance between the different types of stimuli presented to a participant.However, these 8 stimuli tend to be everyday items that participants can easily recognize.9 10

Signal Processing 11
All applications have certain requirements in terms of speed and type of images displayed which, 12 as outlined above, can influence the ERP and therefore also variations in performance as measured 13 by detection accuracy.The signal processing framework plays an important role in being able to 14 cope with variations in ERP and maximizing performance.There is a likely tradeoff between the 15 design parameters used as described above and the levels sophistication build into the signal 16 processing framework, which often varies across studies.Here we review some of the approaches 17 applied.18 To extract the relevant features, data is first pre-processed to improve the signal to noise ratio 22 (SNR).The signal is pre-processed using varying band pass filters, depending on the application, in 23 order to remove high frequency noise or artifacts (such as muscle activity).Generally, lower and 24 upper cut-off frequencies of around 0.

41 7 )
The area (height × width), visual angle and the overt or covert attention requirement of the 42 stimuli.43 Page 15 of 41 AUTHOR SUBMITTED MANUSCRIPT -JNE-102013 A c c e p t e d M a n u s c r i p t

41 and 44 Page
Cognitive Neuropsychology).The MIT Press.42 Diamond, M. R., Ross, J. and Morrone, M. C. (2000) 'Extraretinal control of saccadic 43 suppression.',The Journal of neuroscience : the official journal of the Society for Neuroscience, -JNE-102013 A c c e p t e d M a n u s c r i p t stimuli (Bigdely-Shamlo et al., 2008; M. Cohen, 2014; Sadja et al., 2014).BCI signal processing 1 algorithms are used to recognise spatio-temporal electrophysiological responses and link them to 2 target image identification, ideally on a single trial basis (Manor, Mishali and Geva, 2016).3 4 The most commonly exploited ERP in RSVP-based BCI applications is the P300.The P300 appears 5 at approximately 250-750 ms post target stimulus (Polich and Donchin, 1988; Leutgeb, Schäfer and 6 Schienle, 2009; Ming et al., 2010; Zhang et al., 2012).As specified by (Polich and Donchin, 1988) 7 al. 22 (2010), Poolman et al. (2008) and Bigdely-Shamlo et al. (2008), a trend of using RSVP-based BCIs 23 for identifying targets within different image types has emerged.Research studies show the ability 24 to use RSVP-based BCIs to drive a variety of visual search tasks including, in some circumstances, 25 skills learned for visual recognition.Although the combination of RSVP and BCI has proven 26 successful on several image sets, other research has attempted to establish whether or not greater 27 efficiencies can be reached through the combination of RSVP-based BCIs and behavioural 28 responses (Huang et al., 2007).

24 25 3. Bibliometric study of the RSVP related literature 26
Those rapid eye movements called 10 saccades help to increase our perceptual resolution by placing different parts of the world on the 11 high-resolution fovea.As these eye movements are performed, the image is swept across the retina, 12 yet we perceive a stable world with no apparent blurring or motion".Saccadic blindness thus refers 13 to the loss of image when a person saccades between two locations.Evidence shows that saccadic 14 blindness can occur 50 ms before saccades and up to 50 ms after saccades (Diamond, Ross and 15 Morrone, 2000).Thus, it is important that stimuli have a duration greater than 50 ms to bypass 16 saccadic blindness, unless participants are instructed to attend a focus point and the task is gaze 17 independent and thus does not demand saccades (such as during the canonical RSVP paradigm 18 (section 5.4) ).19 Having considered some of the factors influencing RSVP-based BCI designs, the remainder of the 20 paper focuses on a bibliometric study of the RSVP literature highlighting the key methodological 21 parameters and study trends.Studies are compared and contrasted on an intra-and inter-application 22 basis.Later sections focus on study design parameters and provide contextualized recommendations 23 for researchers in the field.A bibliometric review of the RSVP-based BCIs was conducted.The inclusion criteria for this 27 review were studies that focused on EEG data being recorded while users were performing visual 28 search tasks using an RSVP paradigm.The studies involved various stimulus types presented using 29 the RSVP paradigm where participants had to identify target stimuli.All reported studies where not 30 simply theoretical and had at least one participant.One or more of the keywords BCI, RSVP, EEG 31 or ERP appeared in the title, abstract or keyword list.Only papers published in English were 32 included.The literature was searched, evaluated and categorized up until August 2017.The 33 databases searched were Web of Science, IEEE, Scopus, Google Scholar, and PubMed.The search 34 terms used were: "Rapid serial visual presentation", "RSVP", "Electroencephalography", "EEG", 35 "Brain-Computer Interface", "BCI", "Event Related Potentials", "ERP and "Oddball" 36 Papers were excluded for the following reasons: 1. the research protocol had insufficient detail; 2.

Table 1 . Design Parameters reviewed, Mode: Button press = Yes. Table
Target vs Background distractor), T v[B+NT] (Target vs both Background distractor and Non-Target).

Validating inter-study comparison through performance measures 1
When comparing RSVP-studies it is important to acknowledge that researchers use different 2 measures of performance.Before going into depth about signal processing techniques (section 5.7) 3it is important to discuss, firstly, the variations in approaches used to measure performance.To 4 encourage valid inter-study comparison within and across RSVP application types, it is crucial to 5 emphasize that we are, on the whole, reporting classification accuracy when it is calculated in terms 6 of the number of correctly classified trials.Classification accuracy can be swayed by the 7 imbalanced target and non-target classes, with targets being infrequently presented e.g. with a 10% 8 target prevalence, if all trials are classed as non-targets, correct classification rate would be 90%.9Hence, ROC values are also reported in this review where relevant information was provided in 10 publications reviewed.(2012)was to compare the 37 AUC to the volume under the ROC hyper-surface and the authors found a AUC of 0.878, which is 38 suggestive of the possibility for discrimination between greater than two types of ERPs using 39 single-trial detection.Huang et al. (2006) reported the AUC for session one of two experiments 40 during button press trials.This paper demonstrates that with the three classifiers approach produces 41 similar performance with AUC of >0.8 across the board (Huang et al., 2006).Moreover, accuracy 42 reportedly increases through collating evidence from two BCI users, and reportedly yielded a 7.7% 43 increase in AUC compared to a single BCI user (Matran-Fernandez and Poli, 2014), using 44collaborative BCIs.This process was repeated 20 times to achieve an average accuracy 45 measurement that would not be relatable to other studies included in the bibliometric analysis that 1 involved average performance over single trial test.Cecotti, Sato-Reinhold, et al.
11  12In the literature, there are many variations on how performance is estimated and reported.The 13 studies cited in the current section provide examples of performance measure variations from the 14 literature.The intention of Files and Marathe (2016a) was to develop a regression-based method to 15 predict hit rates and error rates whilst correcting for expected mistakes.There is a need for such 16 methods, due to uncertainty and difficulty in correctly identifying target stimuli.The regression 17 method developed by Files and Marathe., (2016a), had relatively high hit rates which spanned 18 78.4% to 90.5% across all participants.Contrastingly, as a measure of accuracy, Sajda et al. (2010) 19 used hit rates expressed as a fraction of total targets detected per minute.Sajda et al. (2010) discuss 20 an additional experiment that employed ROC values as an outcome measure.In Fuhrmann et al.21 (2014), where the RSVP application was categorization based, accuracy was defined as, the number 22 of trials in which the classifier provided the correct response, divided by the total number of 23 available trials, with regards to target/non-target classification.Yazdani et al. (2010) were 24 concerned with surveillance applications of RSVP-based BCI and used the F-measure to evaluate 25 the accuracy of the binary classifier in use.Precision (fraction of occurrences flagged that are of 26 relevant) and recall (fraction of relevant occurrences flagged) were reported as the F-measure 27 considers both these values.28 29 Different variations in ROC value calculations were also discovered across the studies evaluated.30 Variability in the distribution of accuracy outcome measures is also founded upon whether the 31 dataset is non-parametric e.g.median AUC is reported as opposed to the mean AUC (Matran-32 Fernandez and Poli, 2014).As a measure of accuracy, Rosenthal et al. (2014) conducted a bootstrap 33 analysis, to show the sampled distribution of AUC values for HDCA classifiers where 1000 times 34 over, labels were randomized, classifiers were trained and AUC values calculated through a 35 "leaving one-out cross-validation" technique.Cecotti et al. (2012) presented a comparison of three 36 class classifiers in a 'one versus all' strategy.The focus of Cecotti et al.
(Marathe et al., 2014) used in RSVP-based BCI research in combination with the participant'sEEG  9responses in order to monitor attention(Marathe et al., 2014).The combination of EEG and button 10 press can lead to increased performance in RSVP-based BCIs.Tasks that require sustained attention 11 can cause participants to suffer from lapses in vigilance due to fatigue, workload or visual 12 distractors (Boksem, Meijman and Lorist, 2005).The button press can be used to determine if there 13 is a tipping point during the presentations when participants are unable to consciously detect target 14 in such applications (Umut Orhan et al., 2012; Oken et al., 2014).In many of the studies that did 1 not utilize a button press, researchers are focused on different aspects of the RSVP paradigm other 2 than reaction time.For example, researchers focused on the comparison of two classification 3 methods, image durations etc. (Sajda et al., 2010; Cecotti, Eckstein and Giesbrecht, 2014).target was consciously perceived as present or absent.Such an approach is useful when 11 studying RSVP based parameters and the limits of perception.However, button press responses 12 might be less useful than EEG responses during RSVP for data labelling or image sorting, where 13 the focus is to label individual images within the burst.Nonetheless, Bigdely-Shamlo et al. (2008) 14 apply an image burst approach where a button press at the end of the image burst is used to 15 determine if the participant saw a target image or not.The authors showed that airplanes could be 16 detected in aerial shots with image bursts lasting 4100 ms and images presented at 12 Hz.The 17 button press served well in determining correct and incorrect responses.In practice, however, 18 button press may be superfluous or infeasible.19 A body of researchers are of the opinion that RSVP-related EEG accuracy must surpass button press 20 accuracy in order to be useful.However, this need not be the case as Gerson, Parra and Sajda (2006) 21 report no significant differences in triage performance based on EEG recordings or button presses.22 Nevertheless button based triage performance is superior for participants that correctly respond to a 23 high percentage of target images.Conversely, EEG-based triage alone is shown to be ideal for the 24 subset of participants who respond correctly to fewer images Gerson, Parra and Sajda (2006).25 Hence, the most reliable strategy for image triaging in an RSVP based paradigm may be through 26 reacting to the target image by real-time button presses in conjunction with an EEG based detection 27 method.Target identification reflected in EEG responses can be confirmed by a button press, and 28 through signal processing techniques both reported and missed targets can be identified.29 Studies such as, Marathe et al., (2014) propose methods for integrating button press information 30 with EEG based RSVP classifiers to improve overall target detection performance.However, 31 challenges arise when overlaying ERP and behavioural responses, such as issues concerning 32 stimulation presentation speed and behavioural latency (Files and Marathe, 2016).Pending further studies investigating the reliability of fast detection of neural correlates, EEG based 1 responses have the potential to exceed button press.However, it is not necessary for EEG based 2 (Weiden, Khosla and Keegan, 2012)targets via EEG recordings(Potter et al., 2014).However, the core 15 advantage of the RSVP-based BCIs is the enhanced speed of using a neural signature instead of a 16 behavioral response to determine if a user has detected an intended image of interest.1718Forty of the studies reported use static mode as a method of presentation, six of these papers used 19 moving mode in conjunction with static mode while one study exclusively used moving mode.20Moving mode is more complex than static mode as participants have to take in an entire scene rather 21 than specific images.Moving mode uses motion onset in conjunction with the P300 for scenes in22which the targets are moving, yielding a more realistic setting to validate RSVP-based BCIs 23(Weiden, Khosla and Keegan, 2012).All papers employing moving mode were found within the 24 surveillance application category; this is unsurprising as the moving mode offers the opportunity to 25 detect targets in realistic surveillance situations where movements of people or vehicles are of 26 interest.For the other application areas i.e., medical, categorization etc. the static mode is likely

38 Page 18 of 41 AUTHOR SUBMITTED MANUSCRIPT -JNE-102013
(Huang et al., 2011)7), present a heterogeneous multi-agent system comprising 8 computer vision, human and BCI agents, and showed that heterogeneous multi-agent image systems 9 may achieve human level accuracies in significantly less time than a single human agent by 10 balancing the trade-off between time-cost and accuracy.In such cases a human-computer interaction 11 may occur in the form of button press if the confidence in the response of other, more rapid agents 12 such as RSVP-BCI agents or computer vision algorithm is low for a particular sequence of stimuli.In a surveillance application study carried out by(Huang et al., 2011)targets were surface-to-air 24 missile sites.Target and non-target images shared low-level features such as local textures, which 25 enhances complexity.Nonetheless target images were set apart due to large-scale features like 26 unambiguous road layouts.Another example of surveillance targets denoted by (Bigdely-Shamlo, 27 Andrey Vankov, et al., 2008) is where overlapping clips of London satellite images were 28 superimposed with small target airplane images, which could vary in location and angle within an 29 elliptical focal area.Correspondingly, in (Barngrover et al., 2016), the prime goal was to correctly 30 identify sonar images of mine-like objects on the sea bed.Accordingly, a three-stage BCI system 31 was developed whereby the initial stages entail computer vision procedures e.g.Haar-like feature 32 classification whereby pixel intensities of adjacent regions are summed and then the difference 33 between regions is computed, in order to segregate images into image chips.These image chips 34 were then fed into an RSVP type paradigm exposed to human judgment, followed by a final 35 classification with Support Vector Machine (SVM).36 In the categorization application type images are sorted into different groups (Cecotti, Kasper, et al.whereby five image categories were presented: cars, painted eggs, faces, planes, and clock faces 1 (Sadja et al., 2014).A second study in Fuhrmann, Alpert et al., (2014), containing target (cars) and 2 non-target image (scrambled images of the same car) categories was conducted.In both RSVP 3 experiments, the proposed Spatially Weighted Fisher Linear Discriminant -Principal Component 4 Analysis (SWFP) classifier correctly classified a significantly higher number of images than the 5 Hierarchical Discriminant Component Analysis (HDCA) algorithm.In terms of categorization, 6 empirical grounds were provided for potential intuitive claims, stating that target categorization is 7 more efficient when: there is only one target image type; or distractors are scrambled variations of 8 the target image as opposed to different images all together (Sajda et al., 2014).9 Face recognition applications have been used to seek out whether a recognition response can be 10 delineated from an uninterrupted stream of faces, whereby each face cannot be independently 11 recognized (Touryan et al., 2011).Two of the three studies evaluated utilized face recognition 12 RSVP paradigm spin offs with celebrity/familiar faces as targets and novel, or other familiar or 13 celebrity faces as distractors (Touryan et al., 2011; Bangyu Cai et al., 2013).Cecotti et al 2011., 14 utilized novel faces as targets amongst cars with both stimuli types presented with and without 15 noise.Utilizing the RSVP paradigm for face recognition applications is an unconventional 16 approach, nonetheless the ERP itself has been used exhaustively to study neural correlates of 17 recognition and declarative memory (Yovel and Paller, 2004; Guo, Voss and Paller, 2005; 18 MacKenzie and Donaldson, 2007; Parra, Chiao and Paller, 2011).Specifically, with early and later 19 components of the ERP having been associated with the psychological constructs of familiarity and 20 recollection respectively (Smith, 1993; Rugg et al., 1998).There is thus substantial potential for the 21 utility of the RSVP based BCI paradigm for applications in facial recognition.In the future, RSVP-22 based BCI face recognition may be apposite in a real world setting in conjunction with security-23 based identity applications to recognize people of interest.Furthermore, Touryan et al., (2011) 24 claim that based on the success of their study, RSVP paradigm based EEG classification methods 25 could potentially be applied to the neural substrates of memory.Indeed, some studies show 26 augmentation in posterior positivity of ERP components for faces that are later remembered (Paller 27 and Wagner, 2002; Yovel and Paller, 2004).That is to say, components of ERPs triggered by an 28 initial stimulus may provide an indication of whether memory consolidation of said stimulus will 29 take place, which provides an interesting avenue for utilizing RSVP based BCI systems for 30 enhancing human performance.Based on these studies, it is clear that relatively novel face 31 recognition paradigms have achieved success when used in RSVP-based BCIs.32 33 RSVP-based BCIs that assist with finding targets within images to support clinical diagnosis has 34 received attention (Stoica et al., 2013), for example, in the development of more efficient breast 35 cancer screening methods (Hope et al., 2013).Hope et al. (2013) is the only paper evaluated from 36 the field of medical image analysis and hence described in detail.During an initial sub-study 37 participants were shown mammogram images, where target lesions were present or absent.
(Sajda, Gerson and Parra, 2003VP BCI system application reported in this review, reflected as such 17 by the discussion length of this subsection(Sajda, Gerson and Parra, 2003; Erdogmus, Mathan and 18 Pavel, 2006; Gerson, Parra and Sajda, 2006; Poolman, P., Frank, R. M., Luu, P., Pederson, S. M., 19 and Tucker, 2008; Bigdely-Shamlo et al., 2008; Sajda et al., 2010; Huang et al., 2011; Weiden, 20 Khosla and Keegan, 2012; Cecotti, Eckstein and Giesbrecht, 2012; Matran-Fernandez and Poli, 21 2014; Rosenthal et al., 2014; Yu et al., 2014; Marathe, Ries and McDowell, 2014; A. R. Marathe et 22 al., 2015; Barngrover et al., 2016; Cecotti, 2016; Files and Marathe, 2016).23 , 37 2011; Cecotti, Sato-Reinhold, et al., 2011).Fuhrmann, Alpert et al. (2014), conducted a study (Orhan et al., 2011; Kindermans et al., 2014)).Some systems, such as the RSVP keyboard 16 (described in Hild et al., 2011; Orhan, Hild, et al., 2012a; and Oken et al., 2014) display only a 17 subset of available characters in each sequence.This sequence length can be automatically defined 18 or be a pre-defined parameter chosen by the researcher.The next letter in a sequence become highly 19 predictable in specific contexts, therefore it is not necessary to display every character in the RSVP-20 speller.Studies show that target characters are generally displayed more than once before the 21 character is selected.The length of a sequence and the ratio of target to non-target stimuli can have 22 an effect on the typing rate/performance.In an online study by Acqualagna et al,.target probability had an effect on participant's ability to detect targets and on 18 behavioral performance.The best mean AUC (0.82) was achieved using the 0.1 probability 19 condition.The results show that the percentage of targets shown in an RSVP paradigm has an 20 effect on participants' performance.As number and percentage of target stimuli used can have an 21 effect on the complexity of a task, it is important to keep the percentage of targets Forster, (1970), has shown that participants can process words presented in a 10 sentence at 16 Hz (16 words per second).However, the sentence structure may have influenced the 11 correct detection rate, which has an average of four words per second for simple sentence structures 12 and three words for complex sentences.Detection rates improve when presented at a slower pace 13 e.g., four relevant words per second, with masks (not relevant words) presented between relevant 14 words.Additionally, Fine and Peli, 1995, showed that humans can process words at 20 Hz in an 15 RSVP paradigm.16 17 Potter et al., (2014) assessed the minimum viewing time needed for visual comprehension, using 18 RSVP of a series of 6 or 12 pictures presented at between 13 and 80 ms per picture, with no inter-19 stimulus interval.They found that observers could determine the presence or absence of a specific 20 picture even when the pictures in the sequence were presented for just 13 ms each.The results 21 suggest that humans are capable of detecting meaning in RSVP at 13 ms per picture.In all three face recognition studies, each face image was displayed for 500ms (Cecotti, Sato-1 Reinhold, et al., 2011; Touryan et al., 2011; B. Cai et al., 2013).In two of the studies there was no 2 ISI (Cecotti, Sato-Reinhold, et al., 2011; Touryan et al., 2011), and in the other an ISI of 500ms was 3 given to ensure ample time for image processing (Bangyu Cai et al., 2013).The speed at which face 4 images were shown is reduced in comparison to the other RSVP applications.RSVP spellers most 5 commonly use a duration of 400 ms, RSVP-spellers can benefit from slower stimulus duration with 6 the incorporation of a language model to enable the prediction of relevant letters.The estimation of 7 performance can be challenging in the RSVP paradigm when the ISI is small, as assigning a 8 behavioural response (i.e.; button press) to the correct image cannot be done with certainty.A 9 solution to this problem is to assign behavioral responses to each image, therefore researchers are 10 able to establish hits or false alarms (Touryan et al., 2011).When two targets are temporally 11 adjacent with a SOA of 80 ms, participants are able to identify one of the two targets but not both.12 SOA should be at least 400 ms and target images should not be shown straight after each other 13 (Raymond, Shapiro and Arnell, 1992).Acqualagna et al. 2010, had a four factorial design looking 14 at classification accuracy when the letters presented as no-colour or colour letters at either 83 or 133 15 ms with an ISI of 33ms (Acqualagna et al., 2010).The number of sequence stimuli were presented 16 for enhanced accuracy rate in selecting letter of choice.After 10 sequences ~90% mean accuracy 17 was reached in 133ms colour presentation mode (100% for 6/9 participants).After 10 sequences in 18 133ms no colour presentation mode ~80% mean accuracy was reached (100% in 3/9 participants).19

Table 4 . Parameter and recommendations for RSVP-based BCIs 1 2
trial ERPs, spatial filters are used to enhance SNR and exploit spatial redundancy (e.g.Parraet al., 1 2005).Yu et al. 2011 went a step further by utilizing a methodology that considers spatial and 2 temporal features to ensure augmented single-trial detection accuracy (Yu et al., 2011).Bilinear 3 common spatial pattern (BCSP) was suggested to outperform Common Spatial Patterns (CSP) 4 filters (composite and common spatial pattern filters) (Yu et al., 2011).It should be noted however 5 that CSP spatial filters were not designed to classify ERP but to classify oscillatory EEG activity.6 CSP are indeed ignoring the EEG time course -i.e., the ERP -and are thus suboptimal for RSVP-7 BCI.We would recommend using spatial filters dedicated to ERP classification, such as xDAWN, 8 which were used successfully in many RSVP-BCI.Spatial filters are normally only performed on 9 high-density EEG data which might be impractical in certain real-life applications (Parra et al., 10 2005).High-density EEG data has been reported to increase accuracy (Ušćumlić, Chavarriaga and 11 Millán, 2013).Table 4 shows the most common method used for different application types.12 13 Face recognition applications differ from other applications as face images evoke different ERPs, in 14 addition to the P300.Faces typically evoke a N170 component that changes between targets and 15 non-targets (Maurer, Rossion and McCandliss, 2008; Luo et al., 2010).The vertex positive potential 16 (VPP) is also associated with face recognition (Zhang et al., 2012).The midfrontal FN400 and later 17 parietal FP600 components have been associated with familiarity and recollection, respectively, 18 (MacKenzie and Donaldson, 2007).Specifically, the amplitude of FP600 (a positive deflection 19 >500 ms post-stimulus) was found to significantly correlate with the extent of face familiarity 20 (Touryan et al., 2011).The use of spatial filters that utilize spatial and temporal features may act as 21 an advantage over conventional spatial filters that only exploit spatial redundancy e.g.(Yu et al., 22 2011).However, spatial filters can only be performed on high-density EEG data which might be 23 impractical in certain real-life applications (Parra et al., 2005).RVSP-based BCI paradigms may therefore benefit from the head mounted 17 visual displays however a vision obscuring headset may not be appropriate in some contexts as it 18 could limit the ability of the users, e.g. a person with disabilities, to communicate with their peers 19 and environment.Such a headset may prevent the expressive or receptive use of non-verbal 20 communication skills such as eye movement and facial expressions that are vital for users with non-21 verbal communication skills.22 23 Advancements towards RSVP of targets during moving sequences have shown promising results, 24 although it is more difficult to study movie clips since the stimulus start event is not as clear.A 25 remaining challenge in this area is for researchers to design signal processing tools that can deal 26 with imprecise stimulus beginning/end (Cecotti, 2015).However, an advantage of moving mode is 27 that the target stimulus remains on the screen for longer than with static mode, allowing participants 28 the opportunity to confirm a target stimulus.Moving stimuli studies to date have been limited to 29 surveillance applications so there is a need for further investigation in this area.Just over half the 30 papers used button press mode in conjunction with one of the other modes, as not all of the studies 31 are concerned with comparing EEG responses to motor responses.It is important to develop a scale 32 in order to rank the difficulty of tasks.This will enable the comparison of paradigms that are at the 33 same level.The key outcomes of this study are shown in Table 4, provided as suggested guidelines.34 These are suggested parameters that may be useful to researchers when designing RSVP-based BCI 35 paradigms within the different application types.From this review, we can conclude that using 36 these parameters will enable more consistent performance for the different application types and 37 will enable improved comparison with new studies.38 39 In acknowledgment of the need for standardization of parameters for RSVP-based BCI protocols, 40 Cecotti, Satp-Reinhold et al., 2011 raise an interesting proposal stating that other parameters could 41 be automatically prescribed in accordance with the chosen target likelihood; such as the optimal ISI 42 length, classifiers and spatial filters (Cecotti, Sato-Reinhold, et al., 2011).Such an infrastructure for 43 parameter choices does not currently exist with studies focusing on the impact of different 1 parameters. 2 Future studies would benefit from engaging with iterative changes in design parameters.This would 3 allow for a comparative study of the different design parameters and enable the identification of 4 parameters that most affect the experimental paradigm.A study involving increasing the rate of 5 presentation until classification starts to deteriorate significantly for various types of stimulus 6 categories may indicate the maximum possible speed of RSVP-BCI.Additionally, a future 7 development for RSVP-based BCIs might be to use real life imagery with numerous distractor 8 stimuli amongst the target stimuli.This is a more difficult task but it would enhance paradigm 9 relatability to real-life applications.Hybridizing RSVP BCIs with other BCI paradigms has also 10 started to receive more attention (Kumar and Sahin, 2013).Users of this system navigate using 11 motor imagery movements (left, right, up and down).Search queries are spelt using the Hex-O-12 Speller and results retrieved from a web search engine may be fed back to the user using RSVP.13 This study shows the potential benefits of the RSVP paradigm and how it may be used in order to 14 aid physically impaired users.Eye-tracking can be used as an outcome measure to assess and 15 enhance RSVP stimuli and presentation modes.Specifically, using eye tracking researchers can 16 establish where the participant's gaze is focused during erroneous trials and explore correlations 17 between gaze variability and performance.With the RSVP-based BCI paradigm there is much scope 18 to evaluate different data types/imagery.This is a fast growing field with a promising future.There 19 are multiple opportunities and a large array of potential RSVP-BCI paradigm setups.Researchers in 20 the field are therefore recommended to consider the literature to date and the comparative 21 framework proposed in this paper.22 23 Proceedings, pp.4046-4051.doi: 10.1109/SMC.2016.7844866.1 Boksem, M. A. S., Meijman, T. F. and Lorist, M. M. (2005) 'Effects of mental fatigue on attention: 2 an ERP study.',Brain research.Cognitive brain research, 25(1), pp.107-16.doi: 3 10.1016/j.cogbrainres.2005.04.011.4 Cai, B. et al. (2013) 'A rapid face recognition BCI system using single-trial ERP', in In Neural 5 Engineering (NER), 2013 6th International IEEE/EMBS Conference on, p. (pp.89-92).6 Cai, B. et al. (2013) 'A rapid face recognition BCI system using single-trial ERP', in 2013 6th 7 International IEEE/EMBS Conference on Neural Engineering (NER).IEEE, pp.89-92.doi: 8 10.1109/NER.2013.6695878.9 Cecotti, H., Sato-Reinhold, J., et al. (2011) 'Impact of target probability on single-trial EEG target 10 detection in a difficult rapid serial visual presentation task.',Conference proceedings : ... Annual 11 International Conference of the IEEE Engineering in Medicine and Biology Society.IEEE 12 Engineering in Medicine and Biology Society.Conference, 2011, pp.6381-6384.13 Cecotti, H., Kasper, R. W., et al. (2011) 'Multimodal target detection using single trial evoked EEG 14 responses in single and dual-tasks.',Conference proceedings : ... Annual International Conference 15 of the IEEE Engineering in Medicine and Biology Society.IEEE Engineering in Medicine and 16 Biology Society.Conference, 2011, pp.6311-4.doi: 10.1109/IEMBS.2011.6091557.17 Cecotti, H. et al. (2012) 'Multiclass classification of single-trial evoked EEG responses', in 2012 18 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.IEEE, 19 pp.1719-1722.doi: 10.1109/EMBC.2012.6346280.20 Cecotti, H. (2015) 'Toward shift invariant detection of event-related potentials in non-invasive 21 brain-computer interface', Pattern Recognition Letters.Elsevier Ltd., 66, pp.127-134.doi: 22 10.1016/j.patrec.2015.01.015.23 Cecotti, H. (2016) 'Single-Trial Detection With Magnetoencephalography During a Dual-Rapid 24 Serial Visual Presentation Task', IEEE Transactions on Biomedical Engineering, 63(1), pp.220-25 227.doi: 10.1109/TBME.2015.2478695.26 Cecotti, H., Eckstein, M. P. and Giesbrecht, B. (2012) 'Effects of performing two visual tasks on 27 single-trial detection of event-related potentials.',Conference proceedings : ... Annual International 28 Conference of the IEEE Engineering in Medicine and Biology Society.IEEE Engineering in 29 Medicine and Biology Society.Conference, 2012, pp.1723-1726.30 Cecotti, H., Eckstein, M. P. and Giesbrecht, B. (2014) 'Single-trial classification of event-related 31 potentials in rapid serial visual presentation tasks using supervised spatial filtering.',IEEE 32 transactions on neural networks and learning systems.Institute of Electrical and Electronics 33 Engineers Inc., 25(11), pp.2030-42.doi: 10.1109/TNNLS.2014.2302898.34 Chahine, G. and Krekelberg, B. (2009) 'Cortical contributions to saccadic suppression', PLoS ONE, 35 4(9).36 Chennu, S. et al. (2013) 'The cost of space independence in P300-BCI spellers.',Journal of 37 neuroengineering and rehabilitation.BioMed Central Ltd., 10(1), p. 82.doi: 10.1186/1743-0003-38 10-82.39 Cohen, M. (2014) Analyzing Neural Time Series Data: Theory and practice.MIT Press.40 Cohen, M. X. (2014) Analyzing Neural Time Series Data: Theory and Practice (Issues in Clinical