Brain–machine interfaces for controlling lower-limb powered robotic systems

Objective. Lower-limb, powered robotics systems such as exoskeletons and orthoses have emerged as novel robotic interventions to assist or rehabilitate people with walking disabilities. These devices are generally controlled by certain physical maneuvers, for example pressing buttons or shifting body weight. Although effective, these control schemes are not what humans naturally use. The usability and clinical relevance of these robotics systems could be further enhanced by brain–machine interfaces (BMIs). A number of preliminary studies have been published on this topic, but a systematic understanding of the experimental design, tasks, and performance of BMI-exoskeleton systems for restoration of gait is lacking. Approach. To address this gap, we applied standard systematic review methodology for a literature search in PubMed and EMBASE databases and identified 11 studies involving BMI-robotics systems. The devices, user population, input and output of the BMIs and robot systems respectively, neural features, decoders, denoising techniques, and system performance were reviewed and compared. Main results. Results showed BMIs classifying walk versus stand tasks are the most common. The results also indicate that electroencephalography (EEG) is the only recording method for humans. Performance was not clearly presented in most of the studies. Several challenges were summarized, including EEG denoising, safety, responsiveness and others. Significance. We conclude that lower-body powered exoskeletons with automated gait intention detection based on BMIs open new possibilities in the assistance and rehabilitation fields, although the current performance, clinical benefits and several key challenging issues indicate that additional research and development is required to deploy these systems in the clinic and at home. Moreover, rigorous EEG denoising techniques, suitable performance metrics, consistent trial reporting, and more clinical trials are needed to advance the field.

to walk a quarter of a mile (Centers for Disease Control and Prevention 2014). Both ambulation and rehabilitation to neural systems after trauma, such as stroke and spinal cord injury (SCI), have long been research topics, but more progress is needed. Physical therapy is generally a limited resource because it requires high-intensity labor from therapists. In the past, wheelchairs, passive orthoses, and crutches were the only viable options to provide ambulation outside rehabilitation clinics.
With recent advances in robotic technologies, lowerlimb, powered robotic devices have emerged as an assistive or rehabilitative tool for individuals with motor limitations. These devices have enabled individuals to walk and exercise in previously unavailable ways (Grasmücke et al 2017). The devices fall under two categories, wearable joint actuators (Pons 2008) or devices fixed to a platform (e.g. treadmillbased or paddle-based devices) (García-Cossio et al 2015). Powered orthoses induce motion to one or more paralyzed lower limb joints using external power, usually via electric, pneumatic or hydraulic actuators (Arazpour et al 2015). More recently, exoskeleton devices have emerged as aids for over-ground, bipedal ambulation. The US Food and Drug Administration (FDA) has recognized exoskeletons as Class II medical devices with special controls, and has cleared four exoskeleton devices for marketing in the US: ReWalk Personal (ReWalk Robotics, Israel), Indego (Parker Hannifin, USA), Ekso GT (Ekso Bionics, USA), and Medical HAL (Cyberdyne, Japan). Several studies have reviewed existing lower limb exoskeletons in a clinical context, evaluating the outcomes, effectiveness, possible benefits , Dijkers et al 2016, Lajeunesse et al 2016, Louie and Eng 2016 and potential risks and adverse events (He et al 2017). Although the design forms of these orthoses and exoskeletons differ greatly, at core they are all powered robotic devices that assist walking for medically related purposes.
The interface between the devices and users are often implemented via a combination of mechanical and electrical devices. For instance, the ReWalk exoskeleton uses a shared control architecture with inputs from a wrist-worn device and information based on shifts in the user's body weight, while the Rex exoskeleton (Rex Bionics, New Zealand) is controlled by buttons and a joystick. Single-joint orthoses usually follow the movement of the user by monitoring torque and reacting accordingly (Jackson and Collins 2015). Although effective, these control schemes are not representative of natural human movement (Li and Hsiao-Wecksler 2013). Brain-machine interfaces (BMIs), on the other hand, bypass motor systems of any kind. BMIs make context-based decisions from recordings of the users' brain activity, thus allowing direct and voluntary operation of the devices beyond user's diminished physical capabilities .
The feasibility of using BMIs to control robotics was first demonstrated with invasive BMIs in upper limb applications with both non-human primates (Carmena et al 2003, Ganguly andCarmena 2009) and humans with tetraplegia (Hochberg et al 2012, Collinger et al 2013, Bouton et al 2016. Invasive methods, such as intra-cortically implemented electrode arrays, have high signal-to-noise ratio (SNR) that allows accurate pattern recognition or continuous decoding of kinematic variables. Recently, an invasive BMI was built for a monkey to control a lower-limb exoskeleton in real time (Vouga et al 2017). However, these approaches face the risk of surgical complications and infections, short-term and long-term signal instabilities that degrade neural decoding of intent (Perge et al 2013), and the challenge of maintaining stable chronic recordings (Meng et al 2016). The added risks associated with the testing of BMI systems for lower-limb applications have perhaps precluded the development of invasive BMIs for lower limb applications in humans. In order to mitigate or eliminate safety risks and reach a broader clinical population, the scalp electroencephalogram (EEG) has been used as a non-invasive alternative for BMI applications. Progress has been made in controlling virtual objects (Ono et al 2013, Luu et al 2016, 2015, Meng et al 2016, upper limb robotic devices (Buch et al 2008, Bhagat et al 2016, and wheelchairs (Galán et al 2008, Fernández-Rodríguez et al 2016, Kim et al 2016. EEG has also been used to monitor cortical activity during walkingrelated tasks because of its portability and high temporal resolution (Debener et al 2012, Seeber et al 2013, 2015a, 2015b, Costa et al 2016b, Wagner et al 2016. Evidence suggests that EEG recordings during walking display distinct features that differ from those observed during standing, and more importantly, are coupled with gait cycles (Gwin et al 2011, Cevallos et al 2015, Seeber et al 2015a, 2015b, Wagner et al 2016. Recently, EEG-based BMI control of powered lower-limb robotics has been proposed for the restoration and rehabilitation of gait .
In this review, we systematically reviewed state-of-theart BMIs for lower-limb powered robotic systems providing assistance with walking. The user population, robotic devices, and BMI and robot input and output were summarized. The neural features, decoder, and performance of each BMI system were compared. The purpose of this study is to bring attention to this emerging field, review existing BMI technologies in controlling lower-limb robotics, and identify challenges and opportunities in this field.

Search methods for identification of studies
Studies were searched and screened in this review following the preferred reporting items for systematic reviews and meta-analyses (PRISMA), as shown in figure 1. Two authors (YH, DE) each conducted a literature search on 7 October 2017 within the PubMed (without using MeSH terms) and EMBASE databases using the advanced search method with a broad set of keywords: (Exoskeleton OR 'Powered exoskeleton' OR actuated OR orthosis OR robot * OR assistive) AND ('Lower limb' OR 'Lower extremity' OR 'Lower extremities' OR 'lower-body' OR walk * OR ambulation) AND (BCI OR 'Brain computer interface' OR BMI OR 'Brain machine interface' OR 'Brain-Controlled') AND (EEG OR electroencephalography OR fNIR * OR Functional Near-Infrared Spectroscopy OR 'Brain-controlled' OR 'motor imagery' OR 'intent'). Duplicates from the two databases and studies not within the inclusion criteria (see below) were excluded.
Full texts of the remaining potentially relevant studies were obtained and screened. The two reviewers then compared results. In the case of a disagreement, a third reviewer (TPL) was decisive. Two additional publications that do not contain all the keywords in their titles were manually identified as relevant studies (Kilicarslan et al 2016.
The inclusion criteria by which the studies were included are as follow: • Lower-limb powered exoskeleton or orthosis A variety of design forms exist within lower-limb, powered robotic systems. The terms orthosis and exoskeleton are often used interchangeably. In this review, the term exoskeleton refers to bipedal devices that actuate bilateral hip and knee joints (ankle joints are optional) to assist walking, regardless of whether it is over-ground or on a treadmill. A powered orthosis refers to a device providing actuated assistance to a single joint.
• BMI Both closed-loop and offline BMIs were included. Although the former is the ultimate goal, offline decoding is often the first step towards an online version. Closedloop BMIs were highlighted in the results.

• Walking
This review specifically focuses on robots and BMIs that are directly related to walking. Some orthoses do not require an upright standing or walking posture (Xu et al 2014, Zhang et al 2015. Some BMIs do not control robotic devices to walk, although designed in walking context (Salazar-Varas et al 2015, Costa et al 2016a. They were excluded from this review.

• Neural signals
This review does not exclude any studies based on the type of neural signal, regardless of recording levels (invasive and non-invasive) or the type of subjects (healthy and motor-impaired humans, monkeys).

Data extraction and presentation
Following data were extracted from each identified studies: • User Human/monkey, patients/healthy subjects, and symptoms in the case of patients. • Robotic device Manufacturer, functions, shared control strategy, safety features, and speed • BMI Tasks, neural features, artifact removal method, decoder, and output commands • Performance All identified studies and some of their key features were compiled in table 1. In the rest of this review, the phrase identified studies always refer to these 11 studies.

Assessment of methodological quality
The Quality Assessment Tool for Quantitative Studies, based on the Effective Public Health Practice Project (EPHPP) Guidelines were used to evaluate the methodological quality of each identified study (Jackson and Waters 2005). The following three categories were used: strong, moderate, and weak.

User and robot types
Except for one study that tested an exoskeleton for monkeys (Vouga et al 2017), all other studies tested human participants. Out of the 54 human subjects, there were 15 SCI (28%), 4 stroke (7%), and 35 healthy subjects (65%) (see Abbreviations: SCI, spinal cord injury; CAR, common average reference; CCA, canonical correlation analysis; ASR, artifact subspace reduction; ERD, event-related desynchronization; MRCP, movement-related cortical potential; CSP, common spatial pattern; SSVEP, steady-state visual evoked potential. figure 2(A)). For comparison, a recent systematic review of 27 studies of lower-limb powered exoskeletons published in 2015 with 144 disabled participants reported that 58% of the subjects were SCI, 35% were stroke, and 3% were subjects with unspecified etiology (Federici et al 2015). This user population also agrees with the current trending of the user eligibility of FDA-cleared lower-limb exoskeletons. All four FDA-cleared devices (ReWalk, Indego, Ekso, and HAL) are designed for SCI population, while Ekso also supports patients with stroke. Group analysis of possible differences between the patient and healthy subjects is not practical because of the low number of subjects and varying experimental protocols and outcome measures. Only two of the identified studies tested more than ten subjects, albeit healthy able-bodied subjects For the studies being analyzed, the two devices that appeared most frequently are Rex and Lokomat (Hocoma, Switzerland), shown in figure 3. Lokomat, being a stable device fixed on a platform, has an advantage over mobile exoskeletons in terms of stability. With little risk of falls, Lokomat provides a safe testing environment. The Rex exoskeleton keeps its balance without the support of crutches or any other external system, although in practice, human spotters are often required to guarantee safety (Kwak et al 2015, Kilicarslan et al 2016. The customized exoskeleton used in Donati et al (2016) is also reported to be stable in single support stance without the need of crutches as well. Similar to Lokomat, the machine in Vouga et al (2017) designed for a monkey is stationary on a treadmill. Two identified studies did not use the above devices with selfbalancing features. To compensate, NASA X1 users in He et al (2014) and H2 users in López-Larraz et al (2016) used walkers.
The time a robot needs to physically carry out each task is an important factor to evaluate the robot performance. Lee

Input to the brain-machine interfaces
Except for (Vouga et al 2017) recorded brain signals of a monkey with implanted microelectrodes, EEG was used in all other identified studies. The unanimous usage of EEG in all human studies may be the result of its non-invasiveness and portability.

Neural features
Neural features are usually extracted from the raw input to increase signal-to-noise ratio and reduce computation complexity. The neural features used in the identified studies were categorized into five groups, as shown in figure 2 Neuronal firing rate is a more direct measurement of neural activity than EEG. An increase in the rate at which a population of motor cortical neurons fires is correlated to movement in the preferred direction of the neuronal population (Georgopoulos et al 1986). It can only be accessed invasively, as in the only study with monkey (Vouga et al 2017). All other identified studies relied on EEG recordings.

Movement-Related Cortical Potential (MRCP):
Studies on MRCPs date back decades ago when EEG was first observed, revealing time domain EEG features that are closely associated with movements and decision making (Shibasaki and Hallett 2006). Many MRCP components have been identified, such as P300 and N100. MRCP is the most commonly Event-Related Desynchronization (ERD) is observed by a decrease in EEG power (usually alpha and beta bands) associated with movement-related tasks in both physical activity and motor imagery conditions (Pfurtscheller et al 1997). ERD of 7-25 Hz was used in combination with MRCP in López-Larraz et al (2016). In Do et al (2013), the power spectral densities (PSD) were integrated in 2 Hz bins that were centered at 1, 3, …, 39 Hz, yielding 20 power spectral values per channel. In García-Cossio et al (2015), PSD from 8 to 30 Hz were computed using Welch's method with a Hanning window of 250 milliseconds. In Lee et al (2016), the averaged PSD between 14-19 Hz in a 0.5 s sliding window was computed.
Steady-State Visual Evoked Potentials (SSVEPs) are the brain's response to visual stimulus such as a flickering light. The frequency of the SSVEP response matches the frequency of the visual stimulus. SSVEPs are not directly associated with movement, yet subjects can be trained to make the connection. In Kwak et al (2015), five LED lights operated at 9, 11, 13, 15, 17 Hz were installed on an extended arm stand of the exoskeleton. The users were trained to stare at specific lights for the actions they represent. CCA was used for decomposing the EEG signal in order to extract stimulation frequency related information. CCA finds a pair of linear combinations such that the correlation between two canonical variables X (EEG) and Y (stimulus frequencies) is maximized.

BMI operation and training
In order to control the robots, the subjects need to engage in certain tasks to generate specific neural signal patterns. In offline studies, the tasks involve physically walking and standing (He et al 2014, García-Cossio et al 2015).
In most of the realtime studies, the users imagined discrete tasks such as walking/turning ( (2016) reported a two-phase, long-term protocol: the patients were initially instructed to imagine movements of their hands and arms to control the exoskeleton. Seven months after onset of training, subjects imagined moving their left or right legs to control the stepping of the ipsilateral leg. Vouga et al (2017) designed a continuous virtual tracking task in realtime: a monkey was trained to control a dot on a screen to pursue a moving target. The position of the dot was then mapped to the ankle position of the exoskeleton.

Decoders
Various decoders were found in the identified studies (figure 2(C)). Based on the continuity of states being decoded, they can be grouped into two categories: (1) Those featuring continuous reconstruction of trajectories, such as linear regression (Vouga et (Zhang et al 2017), and maximizing canonical correlation coefficients (Kwak et al 2015).
In discrete classifications, the difficulty in achieving high performance generally increases significantly as the number of class increases. In an offline study, with three different tasks, only two of the tasks were being classified at a time in order to simplify the challenge (García-Cossio et al 2015). Cascaded classification was used in a few studies as a practical workaround to reduce the number of classes (Donati et al 2016, Lee et al 2016). For instance, users in Lee et al (2016) would make an initial, binary choice between walk versus turn; if turn was chosen, a follow up binary choice of direction (left versus right) would be made.
3.6. Output from the brain-machine interfaces 3.6.1. Levels of control. Output from BMIs can be grouped into three types based on their level of control.
At the highest level, the BMI only specifies the end goal of a task, for example 'walk to the door'. This requires the robot to automatically find the best path and carry out gaits with appropriate kinematics. However, its usage is limited by the number of end goals the system supports. Another constraint is the 'smart' exoskeletons: they need to scan surrounding area and find a safe path. This level of control has been reported in powered wheelchair BMIs (Rebsamen et al 2010), but not in any of the identified studies.
At the medium level, a BMI issues discrete commands to control the robot. It only takes four commands (walk, stop, and turn left/right) for a mobile exoskeleton to walk to any direction, although extra support is needed at irregular terrains such as slope and stairs. BMIs in most of the identified studies classified discrete tasks such as walk versus stop: four studies solely implemented this binary classification in their analysis (Do et al 2013, Kilicarslan et al 2013, López-Larraz et al 2016. In addition to walk versus stop, four studies added left/right turns to their tasks (Kwak et  At the lowest level, the BMI issues continuous trajectory control of the robot, as demonstrated in the robotic arm for people with tetraplegia using intracortical microelectrode arrays (Collinger et al 2013). Among the identified studies, (He et al 2014) reconstructed the joint angles and electromyography (EMG) envelopes of lower limbs of the subjects in an offline analysis. The method was later adapted to control a virtual avatar on a screen in realtime (Luu et al 2017). Vouga et al (2017) demonstrated a realtime BMI for a monkey exoskeleton. This realtime control, however, is indirect as the monkey was only explicitly trained to control a cursor on screen in a 2D tracking task. It is not immediately clear how this control strategy is compared against a direct positional control.
3.6.2. Delay. It takes time for a BMI to make an informed decision. This delay comes from (1) the time to collect enough data to analyze; (2) the computational time; although they are usually not distinguished when reporting. In Kwak et al (2015), on average a user looked at LEDs for 3.29 ± 1.83 s (second) to issue a command. In Lee et al (2016), it took a user 1-5 s to move a bar on a screen to the desired direction. If the signal is not strong enough to reach any direction after 5 s, this attempt is aborted. There is an additional 1 s before the attempt to display available actions, and 1 s afterward to repeat the selection. In López-Larraz et al (2016), a sliding window was computed every 62.5 ms from which a command was generated. The users had 3 s to attempt moving the exoskeleton in each trial. In Do et al (2013), 0.75 s segments of EEG data were acquired every 0.25 s in a sliding overlapping window which generates an output. Kilicarslan et al (2013) used a 200 ms moving window with 10 ms shift. Response time in other identified studies is either not reported, or not meaningful because they are offline only.

Input to the robots
BMI adds another source of information to the control of the robot, creating a paradigm known as shared control, illustrated in figure 4. For an overview of the shared control strategies in exoskeletons, readers are referred to Tucker et al (2015). In this section, the review focuses on some unique strategies among the identified realtime BMI studies: the output from BMI decoders were sometimes rejected or delayed by the robot controller to avoid fast alternating or unsafe commands.
In Do et al (2013), the BMI issues binary commands of walk and stop to the robot. A binary state machine was then modeled: the state of the machine changes only if the output exceeds a predefined threshold for at least 2 s. The robot response time is therefore at least 2 s. This delay rejects unnecessarily fast alternating commands during transition. In López-Larraz et al (2016), the exoskeleton only moves after detecting five consecutive move commands. Since the commands are issued every 0.25 s, this delays the robot response by at least 1.25 s. This statistic was not reported in other studies.
A manual switch was used in López-Larraz et al (2016) for safety reason. Every trial required the experimenter to explicitly press an activation button, otherwise any BMI commands would be ignored. Lee et al (2016) is the only study that augmented the robot to automatically scan surrounding obstacles. A Kinect camera and ultrasonic sensors were installed. This customized Rex rejects any user commands that may lead to hitting obstacles. This is a typical example of shared-control: both the user and machine are engaged in the decision-making process for a better and safer control. Internal knowledge without the help of extra sensors can also be used to reject dangerous commands. For example, Kwak et al (2015) supports five actions: walk, turn left/right, stand, and sit. But the sit command can only be triggered from standing. Technically, this feature was implemented in their BMI decoder level instead of the robot control. As both the robots and BMIs become smarter and more deeply integrated, the line between their functions may blur.

Performance
Only limited information regarding BMI performance can be identified among these studies, possibly because many of them are proof-of-concept experiments (table 2). When different results were reported for impaired and healthy groups, only the result of the impaired group is presented. The two studies with continuous BMIs were excluded as their performances were measured in a manner that was different from the others: He et al (2014) reported an average correlation coefficient of 0.4 between the measured and predicted joint angles; Vouga et al (2017) defined performance as how much time the cursor spent in the target in increments of 0.6 s. However, only the result (>90%) from the best session was reported.
3.8.1. False positive, false negative, accuracy, and confusion matrix. False positive, or false alarm, is an event when the classifier falsely rejects the null hypothesis. False negative, or omission, is an event when the classifier falsely rejects the hypothesis. In the case of controlling an exoskeleton, the hypothesis is the user wants to initiate a movement. Do et al (2013) is the only study that used false positive and false negative as their performance metrics. There were on average 7.42 ± 2.85 s of false positive result in each 60 s trials.
When there are more than two classes, confusion matrix is a useful metric to show if the decoder has bias towards certain classes. Kwak et al (2015) and Zhang et al (2017) are the only two identified studies that reported confusion matrices. Pattern of high values on the diagonal was clearly shown.
Accuracy is the ratio between the number of correctly classified trials (both true positive and true negative) versus total trials. Four studies in table 2 reported their performance with accuracy (García-Cossio et al 2015, Kwak et al 2015, López-Larraz et al 2016. Three others defined accuracy as the percent age of correctly classified EEG samples (Kilicarslan et al 2013, Zhang et al 2017. All above metrics do not reflect the challenge from the number of classes. Generally, it is more difficult to achieve the same level of accuracy when the number of classes increases. They also do not take the response time into consideration.

Information transfer rate.
Information transfer rate (ITR), expressed in bits per second, is a commonly used criterion to assess the overall performance of BMIs (Wolpaw et al 1998). It is expressed with the following formula, where T is the time in seconds needed to convey each command, p is the classification accuracy and N is the number of classes. Note that equation (1) is only valid under some assumptions, including (a) all the output commands are equally likely to be selected; (b) the classification accuracy is the same for all the classes; (c) the classification error is equally distributed among all the remaining symbols (Yuan et al 2013). Meeting these assumptions can be challenging. For example, the cascaded decoder in Lee et al (2016) and state-based decoder in Kwak et al (2015) treat classes differently. Therefore, ITRs in table 2 should only be viewed as an approximation of the actual values. Kwak et al (2015) reported ITR as one of their performance metrics and briefly mentioned the above limitation. Furthermore, ITR in López-Larraz et al (2016) was manually estimated by the authors of this review based on other information it reported. The average ITR over these two available records is 0.37 bits s −1 . This value agrees with the results found in other BMI studies. Typically, the ITR in BMI studies falls below 1 bit s −1 with either human or monkey subjects, and either invasive or non-invasive recording Tehovnik et al (2013).

Control efficiency.
The above metrics only measure the technical performance of the mathematical models in the BMI. They work well in BMI systems which heavily depend on the decoder itself, such as BCI spelling machines (Kaufmann et al 2012, Kalika et al 2016. However, they only work to some extent in the case of controlling sophisticated external devices such as lower limb exoskeletons. For instance, BMI output may get overwritten in the shared control , López-Larraz et al 2016. And the cost of different errors is different: incorrectly classifying 'walk' into 'stand' can be recovered quickly, while incorrectly classifying 'walk' into 'turn right' needs another 'left turn' to recover. Control efficiency describes the overall time efficiency of the system. It is widely used in brain-controlled wheelchair (BCW) studies (Bi et al 2013). There are two key metrics: mission time and concentration time. They are both normalized by the nominal time (Rebsamen et al 2010).
• The mission time is the time from when the user initiates the command to the moment a destination has been reached. • The concentration time is the duration in which the user engages in the BMI task. • The nominal time is the minimal time required for the device to reach the destination under direct control of experimenters. • The mission time and concentration time can be normalized by the nominal time, resulting in the mission time ratio and concentration time ratio respectively. • The control efficiency is measured by these two ratios, expressing the wish to minimize both the time and concentration time required to perform the task.
Kwak et al (2015) is the only identified study that reported mission time (1100 s) and nominal time (543 s), which led to a mission time ratio of 2.02. It is in line with results from previous BCW studies, ranging from 1.13 to 10 and beyond (Bi et al 2013). Anecdotally, the authors mentioned the nominal time was limited by the slow speed of the exoskeleton. Lee et al (2016) reported their mission time in two different tasks (285 s, 385 s) and made comparison with baseline protocols (385 s, 404 s). However, the baseline in this study was another rudimental BMI protocol instead of the nominal case.

Methodological quality
The methodological quality summary based on the EPHPP quality assessment tool is summarized in table 3. Details of the grading choices are attached in appendix.  It also decreases correlation between sensors by reducing the sum of all signals to zero-mean. However, it is not effective in reducing local artifacts such as eye blinking (Urigüen and Garcia-Zapirain 2015). CAR is also unable to reject bad channels or sessions. On the other side, Thresholding-based methods remove EEG signals that exceed the normal range of signals, usually defined as within certain standard deviations from the mean power/amplitude (Do et al 2013, López-Larraz et al 2016. They only work effectively against either temporally or spatially localized artifacts.
There are also a few newer methods. They each holds promising results according to preliminary results. It will be interesting if these methods will get validated in more studies.
EMG artifact presents a wide spectral distribution to all common EEG bands. There are few automated methods to remove EMG artifacts because reference signals are rarely available in an EEG study. García-Cossio et al (2015) discomposed EEG into components using Canonical Correlation Analysis (CCA). The components having power in the EMG frequency band (15-30 Hz) more than 1.3 times stronger than in the EEG frequency band (1-30 Hz) were removed. In other studies, no specific reference was made to the use of methods to identify and remove potential EMG artifacts. EMG is considerably weaker in low frequency, likely allowing those studies that worked with delta band MRCPs to suffer smaller contamination (Urigüen and Garcia-Zapirain 2015).

Artifact Subspace Reduction (ASR)
is an automatic artifact rejection method with a sliding window (Kilicarslan et al 2016, Zhang et al 2017. Within each window, the ASR algorithm identifies principal subspaces which significantly deviate from the baseline EEG and then reconstructs these subspaces. Bulea et al (2014) reported that ASR is effective at removing high amplitude artifacts from EEG data and does not alter EEG during pre-movement periods, while another study found no significant differences between the classification results with and without ASR (Zhang et al 2017).
H ∞ adaptive filtering was proposed as an online noise removal method (Kilicarslan et al 2016). Based on a framework of adaptive noise canceling, its sample-based adaptive formulation inherently adapts to EEG's time-varying fluctuations. It significantly removes ocular contamination and amplitude drifting from EEG signals in realtime. Kilicarslan et al (2016) showed that it improved decoding accuracy, and performed better in comparison to ASR and offline ICA.
No action: López-Larraz et al (2016) justified the reason for not removing any noise in the closed-loop scenario. In this study, a z-scores based automatic method was used to clean the training data. The authors argued that because the decoder was trained with artifact-free data, artifacts presented during testing would not increase performance, and at most, they would decrease it. This method relies on the assumption that the z-scores method removes all the noise.
Artifact removal procedures developed in other walkrelated studies may be helpful in guiding future BMI studies. Artifacts produced by gel displacement during walking have been carefully characterized (Costa et al 2016c). Computational models were validated to remove movementrelated artifacts in EEG during walking (Gwin et al 2010, Lau et al 2012, Lin et al 2013. However, these models are not readily available to be transferred to BMI applications because they are usually limited to offline processing. Interestingly, motion artifacts were found to be negligible with careful experimental setup and when low speed is maintained (Nathan and Contreras-Vidal 2016). This insight may create an opportunity to use delta band EEG in exoskeleton applications if the speed is slow, which is typical at least in early stages of gait rehabilitation. Meanwhile, lower limb exoskeleton applications bring some unique challenges in EEG artifacts removal that have not been addressed before. EEG motion artifact studies were usually conducted without an exoskeleton. Body weight transition from one foot to the other may not be as smooth when walking in an exoskeleton, which could potentially increase motion artifacts and displace electrodes. More studies are needed to characterize the artifacts during exoskeleton applications.
4.1.2. Safety. Compared to interacting with other common BMI targets such as wheelchairs (Rebsamen et al 2010), robotic arms (Hochberg et al 2012), and virtual objects (Luu et al 2015), controlling an exoskeleton adds significant risk to the user safety. The additional risk includes: (1) The risk of using exoskeletons as medical devices FDA classifies ambulatory exoskeletons as Class II devices with special controls. Falls, skin damage, and bone fracture are among the foreseeable risks of using exoskeletons, requiring continued vigilance from researchers. Several pilot studies have concluded that it is feasible and safe for the clinical population to use exoskeletons ( (2) Low tolerance to misclassification Any mistake can be costly when using an exoskeleton. Practical BMI control requires extremely high performance from the neural decoders. If a BMI erroneously triggered an exoskeleton to move when standing close to stairs or walls, the exoskeleton may fall and result in serious injury to its user.
The risk requires vigilant attention from the researcher. To address the inherent risk of using exoskeletons, it is recommended to use devices that are less prone to falls, for example devices that are fixed on platforms, or mobile exoskeletons that are secured by overhead harnesses. To mitigate the risk of misclassification, higher accuracy would be the desire solution. When that is not attainable, effort should be made to reduce the cost of misclassifications. For example, shared control can be used to automatically detect surroundings or reject dangerous maneuvers. The exoskeleton's potential path should be free from obstacles or dangerous terrains.
Different risk mitigation methods have been adopted in the current BMIs. On the protocol design side, many identified studies rely on human spotters (Kilicarslan et  to prevent falling. On the shared control side,  augmented a Rex exoskeleton with a Kinect camera and four pairs of ultrasonic sensors to detect surrounding obstacles and reject unsafe commands. López-Larraz et al (2016) required the experimenter to explicitly press an activation button to start trials for safety reasons. Donati et al (2016) used pressure sensors, wire sensors and gyroscopes to ensure that the exoskeleton followed the correct trajectory. On the device side, the popularity of self-balancing devices like Rex and Lokomat in table 1 may also be a result of safety concern.
To date, there are no reports of adverse events occurred during BMI protocols with any powered lower-limb robotic device. However, safety will be a key issue for BMI augmented exoskeleton applications if it goes beyond the lab environment in the future. We urge researchers to carefully plan their studies to ensure user safety, and systematically report any adverse events or safety challenges.

Responsiveness.
The responsiveness of a BMI controlled exoskeleton can be defined as the duration from the moment a user issues a command to the time the exoskeleton finishes executing it. The shorter the delay is, the better responsiveness the user enjoys. This delay comes from three factors: the BMI, the shared control, and the mechanical execution time in the exoskeleton. Each of the three factors could take up to several seconds to finish (see Results). They add up to a significant delay that should not be neglected. Currently, there are no studies reporting the total response time.
Additionally, commands to exoskeletons like Rex are atomic. The robot has to finish the current command in order to execute the next one. In the event of an incorrect command, the user needs to wait for the wrong command to finish in order to issue a new one, which doubles the delay. Unsurprisingly, the extra delay was reported to lead to user frustration (Lee et al 2016). 4.1.4. Practical challenges. As an assistive and rehabilitative device with practical purpose, exoskeleton users are likely to use them outside lab conditions. The decoder needs to be robust to avoid false positive from talking, nodding, etc (Do et al 2013) mentioned anecdotally that no disruption to BMI operation was observed when the subjects engaged in brief conversations. Formalized testing of this hypothesis is needed.
The financial and time cost should also be considered in evaluating the BMI-robotic systems. Cost generally is the first factor mentioned by end users and physiotherapists (Rebsamen et al 2010). The financial cost of exoskeletons and research-grade EEG recording devices is not clear. Gel-based EEG recording devices require considerable time and effort to prepare patients during gelling and cleaning. For clinical applications, this is a difficult decision to make because the current insurance policies in the United States usually cover only a limited of hours of rehabilitation program per patient. For daily activities, it is also undesirable to require lengthy preparations and others' assistance each time they use the device, as reported in López-Larraz et al (2016).
Cosmesis, comfortability, and societal acceptance are also critical practicality issues. Dry EEG recording devices, smarter and safer exoskeletons, and more publicity may help this technology to be more widely accepted in the general public.

Clinical relevance
Users and prescribers of lower limb exoskeletons are concerned with the effectiveness of the devices in providing mobility in indoor and outdoor environments, their use in rehabilitation programs, their safety and possible health benefits, particularly with respect to pain, spasticity and autonomic activity, particularly bowel function. There have been many discussions over the clinical relevance of exoskeleton even without BMIs. Recent systematic reviews provide information on these topics. Some studies showed benefit of the exoskeleton over conventional therapy while other studies found no difference or that comparison therapies were superior.
A scoping review reported that clinical trials of powered robotic exoskeletons for post-stroke rehabilitation were free of serious adverse events and could bring meaningful improvement in sub-acute stroke (Louie and Eng 2016). A review of the usefulness of exoskeletons showed that ReWalk, Indego, and Mina (IHMC Robotics, USA) are effective for walking in a laboratory for individuals with complete lower-level SCI; however the level of scientific evidence of long term benefits is low (Lajeunesse et al 2016). One review concluded that exoskeletons are safe to use in real-world settings and known to yield health benefits (Miller et al 2016). However, this study was later challenged as it included duplicate subjects and studies, rendering its conclusions questionable (Dijkers et al 2016). An extensive review of the tests used in studying robotic aided movement of the lower extremities functions was published with the goal of improving exoskeleton design (Maggioni et al 2016). The protocols, outcomes, inclusion/ exclusion criteria, and metrics in published clinical exoskeleton studies with individuals with SCI were found to vary greatly across studies . Recently, a clinical study with 55 spinal cord injury patients proved significant benefits after they were rehabilitated with exoskeletons (Grasmücke et al 2017). On the other hand, there are reviews that found no consistent benefit to rehabilitation using an exoskeleton versus a variety of conventional rehabilitation methods (Federici et al 2015, Fisahn et al 2016).
Current clinical rehabilitation programs, such as treadmill body-weight support gait training, focus on interventions at the peripheral level of the body. The expectation from rehabilitation programs is that the repetitive motor training will trigger neuroplasticity that promotes the recovery of motor functions (Pinter et al 2016). By contrast, BMI approaches directly measures brain activity to control robotic devices, therefore encourage neural activities that drive the designated motor tasks. These approaches encourage voluntary control and motor intent from users, which have been demonstrated as one of the main principles in rehabilitation and may trigger stronger neuroplasticity (Koenig et al 2011, Luu et al 2017. Additionally, better understanding of neural representations of human movements from advanced BMI technology may propel the development of a novel training paradigm for improving the efficacy of rehabilitation in a top-down approach. Donati et al (2016) is the only identified study that reported clinical improvement after protocols. However, its result is complicated by its mixed usage of treadmill-based robot, overground exoskeleton, virtual reality tasks, zero-G treadmill, BWS, and tactile feedbacks over 12 months.
It is worth noting that although SSVEP-based systems have the best performance in table 2, they may not provide additional benefits by exploiting cortex plasticity. SSVEPs are elicited by the task (looking at LEDs) that are not physiologically related to walking.

Decoding and reporting approaches
The BMI design can be either exogenous (Kwak et al 2015) , see table 2). In addition, the number of training sessions required by users to reach a successful performance is reduced. For instance, its subjects were instructed to fixate at LED lights with different blinking frequencies for 50 times, 5 s each time. This process is likely to be significantly shorter than typical BMI studies where multiple training sessions are necessary for optimal performance. There are even users that cannot get reasonably good performance despite enough training, a phenomenon known as BMI illiteracy (Vidaurre and Blankertz 2010). These aspects are important for improving users' confidence during the control of exoskeletons. However, one of the biggest limitations of exogenous BMIs is that they require users to find and fixate certain visual stimulus. Thus, this approach decreases user attention at the core task of walking, raising safety concerns, and may be unlikely to provide clinical benefit.
The reported performance among identified studies varies greatly (table 2). For instance, (Kilicarslan et al 2013) report 99% accuracy. However, this accuracy refers to the percentage of correctly classified EEG samples instead of trials, which does not directly translate to how well it performs in practice. In fact, there is little discussion in how to best report the performance. Accuracy do not reflect the number of classes and time allowed in deliberating each command. Additionally, decoding accuracy and ITR may strongly correlate to the skewness of the data. For example, a person chooses to walk straight and the decoder always predicts walk instead of turns. When the same number of trials in each class cannot be guaranteed, F 1 score along with other metrics is recommended because it considers both the precision and the recall ratio of a test, therefore robust to skewness in the data distribution. Above all, control efficiency closely measures the general performance of the system regardless of the technical details.
The authors encourage future studies to report it to (1) allow objective comparison between studies; (2) provide an accurate sense of the usefulness of the system in practice.

Conclusion
Studies containing BMIs for commanding lower-limb robotic systems in the current literature were systematically reviewed. The devices, user population, input and output of the BMIs and the robots, neural features, decoders, and system performance were summarized across the identified studies. The tasks often involved classification of discrete state commands such as walking, stopping, and turning. It was observed that few EEG denoising techniques were implemented or they were not sufficiently validated. Various neural features and decoders were used for neural classification. Horizontal comparison was attempted by examining ITR and control efficiency. Overall, the system performance is promising, but far from practical applications because of the small sample pool, potential safety risks, and other challenges.