Bayesian optimization of peripheral intraneural stimulation protocols to evoke distal limb movements

E Losanno; M Badi; S Wurth; S Borgognon; G Courtine; M Capogrosso; E M Rouiller; S Micera

doi:10.1088/1741-2552/ac3f6c

1. Introduction

Neuroprostheses hold the potential to restore motor functions to people with paralyzed limbs due to neurological disorders or injuries, such as stroke or spinal cord injury. Various neuroprosthetic systems have been developed to restore and assist lower and upper extremity movements by electrically stimulating the spinal cord [1–6], skeletal muscles [7–16] or peripheral nerves [17–23], through different interfaces. Among the existing solutions, polyimide-based intraneural electrodes [24, 25] have shown high biocompatibility and long-term stability [19, 26] as well as good performance in restoring fine motor functions in rats [27] and monkeys [28]. These results show their great potential for clinical use.

The design of effective motor neuroprostheses requires the optimization of the stimulation protocols to evoke functional target movements. This is particularly important and challenging when using peripheral nerve stimulation (PNS). In fact, the PNS-evoked motor response is not known a priori, as both the electrode placement and the motor fibers distribution within the peripheral nerves vary from subject to subject [29]. Furthermore, the non-linear nature of muscle recruitment [30] makes it difficult to predict the motor patterns that will be produced when stimulus intensity or frequency is modulated. Finally, the stimulation will elicit different responses over time due to the immune response in the early stages after electrode implantation [19] and the possible plasticity of the central nervous system in later stages of neuroprosthesis use [31]. Therefore, a search over the space of stimulation parameters is required and needs to be repeated over time for optimal performance.

The space to explore in order to identify the most effective stimulation paradigms can be large, as multiple active sites can be used to stimulate, and various current parameters must be set. For example, when optimizing single-channel burst stimulation to produce a specific movement, we need to tune the active site, stimulus amplitude and pulse-width, stimulation frequency, and burst-length. The number of possible combinations of parameters can be very high and increases exponentially when multichannel stimulation is used. Exhaustive testing of all those combinations within a single experimental session would thus be very time-consuming and exhausting for both the subject and the experimenter. At the same time, the manual selection of a subset of parameters to be tested could lead to suboptimal solutions. For all these reasons, an automatic algorithm that can effectively and efficiently reach the global optimum would be of great benefit.

Bayesian optimization (BO) represents a well-established technique for optimizing black-box functions under time constraints [32] and can thus be beneficial in the framework of neurostimulation tuning. More precisely, BO relies on the use of a probabilistic model of the expensive-to-evaluate unknown objective function and iteratively selects the best point of the search space to visit, i.e. the point that is more likely to quickly lead to the objective global optimum, based on a so-called acquisition function. This model-based procedure allows to jointly tune multiple parameters with fewer experiments than grid search and find better solutions than manual incomplete search [33]. Building on the promise of this technique, solutions based on BO have been developed to optimize epidural spinal cord stimulation in rats [34] and intracortical stimulation in a monkey [35]. In these studies, powerful algorithms were implemented to find the optimal stimuli to maximize the activity of individual muscles. However, functionally meaningful motor patterns are generated by the coactivation of various synergistic muscles, while nonfunctional muscles remain inactive. At the same time, stimulation techniques that target neural structures, such as PNS, run the risk of eliciting adverse responses if only one muscle activity is optimized at a time. This is due to the limited spatial segregation of functionally different motoneurons and the imperfect selectivity of neural interfaces. Therefore, to evoke behavioral movements with neurostimulation, it is necessary to search for stimuli that optimize multiple and potentially conflicting motor-related objectives simultaneously. This more complex and non-trivial optimization problem has not yet been investigated.

We here hypothesized that by aggregating multiple suited objectives into an effective and easily computable function to be optimized, and by selecting an adequate probabilistic model, BO algorithms could efficiently drive the search for optimal stimulation parameters to evoke functionally meaningful movements, thereby reducing the duration and complexity of this important process. We thus extended the application of BO to this problem, which is of higher clinical relevance than maximizing a single muscle activation at a time. Specifically, we developed a BO algorithm to efficiently optimize intraneural PNS to elicit ankle movements in rats and hand grasping functions in macaque monkeys. Motor responses were featured by multiple attributes based on muscle activity, which were scalarized into an optimization objective to trade them off. We compared various models that implicitly or explicitly incorporate the correlation between the outputs of our stimulation-to-motor system. The performance of the different versions of our algorithm was first evaluated offline in both animal species, across a variety of implants. We assessed the efficiency of parameter space search and accuracy in selecting optimal solutions, as the ground truth was provided. We also evaluated the effectiveness of the identified stimuli in eliciting the target movements and, consequently, the appropriateness of the chosen multi-objective function. We then conducted a preliminary online test in an additional monkey to experimentally validate the functionality of the algorithm.

2. Methods

2.1. Rat experiments

2.1.1. Animals and implants

Experiments were conducted on three adult female Lewis rats (LEW-ORlj, Charles River Laboratories, France) (weight ∼ 220 g). The animals were group-housed on a 12 h light-dark cycle with food and water ad libitum. The three rats were chronically implanted with a SELINE intraneural electrode [25] (figure 1(a)) in the left sciatic nerve, placed ∼0.5 cm proximally to the bifurcation into the tibial and common peroneal nerves. Wires were connected to a 3D-printed pedestal that was sutured to the fascia of the back-muscles (multifidus muscle and gluteus superficialis), through a 2 × 2 cm piece of surgical mesh (Mersilene mesh, Ethicon Inc., NJ). To acquire intramuscular electromyographic (EMG) activity, bipolar electrodes (Teflon-coated stainless steel wires) were chronically implanted in four muscles of the left hindlimb: tibialis anterior (TA) (responsible for the flexion of the ankle), soleus (SOL), gastrocnemius medialis (GM), and plantaris (PL) (jointly responsible for the extension of the ankle). Wires were routed subcutaneously to a percutaneous connector (Omnetics Connector Corp., MN) that was fixated to the animal's skull with dental cement. All surgeries were performed under aseptic conditions and general anesthesia with isoflurane in oxygen enriched air (1%–2%). Details on the rats involved in the study are specified in supplementary table 1 (available online at stacks.iop.org/JNE/18/066046/mmedia).

2.1.2. Experimental setup and procedure

Data acquired during an experiment of single pulse stimulation of the sciatic nerve (figure 1(b)) in the three rats were used to test the algorithm offline. The experiment was conducted 4–5 weeks after implantation. Rats were slightly sedated (Domitor, 0.0025–0.005 ml kg⁻¹) and positioned at the edge of a table with their legs hanging down. Electrical stimulation was delivered by the IZ2H stimulator (Tucker David Technologies, USA). Biphasic cathodic-first current pulses (1 Hz) with a pulse-width of 40–80 us were applied from each working active site of the SELINE electrode at increasing amplitude values, from subthreshold to saturation of the compound muscle action potential (CMAP) (increments of 30–50 uA, three repetitions per value). Intramuscular EMG signals were amplified (1000×, Differential AC amplifier, AM systems, WA), bandpass filtered (10–2000 Hz) and acquired at ∼25 kHz using the RZ2 Processor (Tucker David Technologies, USA). Hindlimb kinematics was recorded using a high resolution camera.

2.2. Monkey experiments

2.2.1. Animals and implants

Experiments were carried out on five adult female macaques (Macaca fascicularis) (4–9 years old, weight between 3.1 and 3.9 kg) that were initially involved in other studies: acute electrophysiology was performed in three animals during a terminal procedure before euthanasia, and chronic electrophysiology was performed in two others. The animals were group-housed in an enriched indoor room and had access to water and food ad libitum. All the monkeys were implanted with a custom-made transverse intrafascicular multichannel electrode (TIME) [24] in the median nerve and two of them were additionally implanted in the radial nerve. In particular, the TIME used in this study (the Mk-TIME) (figure 1(c)) was adapted from a previous design [36] to fit the dimensions of the monkey arm nerves. Implantation of the Mk-TIME followed previously described methods [37]. We inserted the median electrode ∼2 cm proximally to the elbow and the radial electrode ∼2 cm proximally to the epicondyle along the humeral bone. To record intramuscular EMG activity, each animal received the implantation of bipolar electrodes in up to six flexor muscles of the hand: flexor carpi radialis (FCR) and palmaris longus (PL) (wrist flexors), flexor digitorum profundus (FDP) and flexor digitorum superficialis (FDS) (finger flexors), thenar (THE) (thumb flexor), first dorsal interosseous (FDI) (index abductor). The two monkeys with the Mk-TIME in the radial nerve were also implanted in four extensor muscles of the hand: extensor carpi radialis (ECR) (wrist extensor), extensor digitorum communis (EDC) (finger extensor), extensor carpi ulnaris (ECU) (responsible for wrist ulnar deviation), abductor pollicis longus (APL) (thumb extensor). Chronic EMG recordings were performed using pairs of Teflon-coated stainless steel wires, implanted according to the procedure described in [3]. Acute EMG recordings were performed with subcutaneous needles (Ambu® Neuroline). Intrinsic hand muscles were always implanted acutely. All surgeries were performed under aseptic conditions and general anesthesia induced with midazolam (0.1 mg kg⁻¹), methadone (0.2 mg kg⁻¹), and ketamine (10 mg kg⁻¹) and maintained under continuous intravenous infusion of propofol (5 ml kg⁻¹ h⁻¹) and fentanyl (0.2–1.7 ml kg⁻¹ h⁻¹). All the details about the monkeys involved in the study and the implants received are specified in supplementary table 2.

2.2.2. Experimental setup

In the case of acute implants, electrophysiological experiments were conducted just after the surgical intervention, keeping the animals under anesthesia with intravenous perfusion of propofol (5 ml kg⁻¹h⁻¹) and fentanyl (0.2–1.7 ml kg⁻¹h⁻¹) and continuously monitoring the vital signs. As for the two chronically implanted animals, Mk-Me was anesthetized during the experimental session analyzed, while Mk-Ol_c was awake, sitting on a chair and slightly sedated (ketamine, 3.5 mg kg⁻¹, NaCl 1:1, 0.1 ml/15 min). During all the sessions the animal's forearm was restrained on an armrest while the hand was either relaxed or held open by elastic bands tied to the fingers. Electrical stimulation was delivered by the IZ2H stimulator (Tucker David Technologies, USA) as bursts of charge-balanced cathodic-first biphasic pulses. Stimulation parameters (channel, burst-length, pulse-width, amplitude and frequency) were set through custom-made MATLAB (MathWorks, Natick MA) or C++ (Visual Studio®, USA) routines. Intramuscular EMG activity was amplified (1000×, PZ5, Tucker David Technologies, USA) and acquired at ∼12 kHz using the RZ2 Processor (Tucker David Technologies, USA). Hand kinematics was recorded using up to three high resolution cameras. Grip force was measured using a custom sensor [38].

2.2.3. Experimental procedure to acquire data for algorithm offline test

Data acquired during an experiment of burst stimulation of the median and radial nerves (figure 1(d)) in n = 4 monkeys (supplementary table 2) were employed to test the algorithm offline. During the experiment, all the working active sites of the median and radial Mk-TIMEs were scanned with 500 ms monopolar stimulation bursts at increasing current intensities. The stimulation frequency was fixed at 50 Hz and the pulse-width was set at either 40 or 80 us depending on the channel impedance. As for the selection of intensities, different procedures were undertaken in the four monkeys. For Mk-Pa, we tested all the amplitude values from movement threshold to movement saturation (increments of 10 uA, three repetitions per value). For the other three animals, the amplitude values tested were manually selected within the functional range in a step that was neither uniform nor consistent, resulting in not completely exhaustive datasets. Each intensity value was applied for 1–6 repetitions in this case. In Mk-Me, two of the tested channels were removed from the dataset. Indeed, they generated the pronation of the forearm, and this type of movement could not be characterized because the EMG activity of the muscle responsible for it (pronator teres) was not recorded (supplementary table 2).

2.2.4. Algorithm online test procedure

The algorithm was tested online on Mk-Le, acutely implanted with the Mk-TIME in the median nerve. Before the actual test, a training phase was performed to gather preliminary information about the implant just performed. During this phase, we applied a few bursts of stimulation from all the Mk-TIME active sites to determine which ones elicited a visible motor response and which amplitude values corresponded to movement threshold and saturation. Based on this quick inspection, we restricted the algorithm search space to functional channel-amplitude pairs: non-functioning channels were discarded, and for the remaining active sites, the amplitude was restricted to the values between the minimum and maximum determined experimentally (steps of 10 uA). The burst-length, frequency and pulse-width were fixed at 500 ms, 50 Hz and 40 us, respectively. Then the actual algorithm test began. The first burst of each run was delivered from the first channel of the map (figure 1(d)) at minimum amplitude. For the following iterations, the algorithm selected the active site and amplitude value to be tested, and a stimulation burst with these parameters was applied. The recorded EMG signals were processed and the extracted measures were added to the set of previous observations that the algorithm used to choose the subsequent action. Due to the limited time available for the experiment, we terminated each run after 60 iterations.

2.3. Algorithm design

We developed an algorithm to optimize monopolar single pulse and burst stimulation delivered by intraneural electrodes implanted in the peripheral nerves of the lower and upper limbs to elicit ankle and hand motor functions, respectively. The technique was designed to be used online during a single experimental session, to drive the exploration of the space of possible stimuli in a way to reach optimal solutions as quickly as possible. We restricted the space of input parameters to two dimensions, defined by channel and stimulation amplitude, while keeping the values of the other parameters (pulse-width for single pulse stimulation; pulse-width, frequency and burst-length for burst stimulation) constant. Evoked motor patterns were represented as vectors of measures extracted from EMG signals of lower and upper extremity muscles responsible for producing the targeted motor functions. In summary, we aimed to find the optimal selection of intraneural channel and amplitude to evoke specific EMG patterns underlying functionally meaningful movements, with the minimum number of observations.

2.3.1. Overview of BO

BO [32] was identified as a perfectly suitable strategy for our purpose, as it is an ideal tool for optimizing black-box objective functions under query limits. Briefly, BO models the unknown objective function as a random function and assigns it a prior distribution. At each iteration, a new response of the function is observed, and a posterior distribution is built by conditioning the prior on all the data observed so far. An acquisition function is finally applied to the posterior distribution to determine the next query point. The acquisition function trades off the exploration of high-uncertainty regions and the exploitation of high-reward regions to reach the global optimum of the objective with the minimum number of iterations. In practice, it is a matter of exploiting promising regions so as not to waste time on useless observations, but at the same time exploring the sample space sufficiently so as not to get stuck in local optima. To apply this technique, we had to define a scalar objective function, a prior for our measurable quantities, and an acquisition function. We used objective functions built from the aggregation of multiple EMG measures, multi-output Gaussian processes (GPs) [39] as priors, and customized upper confidence bound (UCB) like acquisition functions [32]. Details are specified in the following sections. The functioning of our GP based BO (GP-BO) algorithm is summarized in figure 2.

**Figure 2.** GP-BO algorithm functioning. (a) Algorithm block diagram. The motor response to PNS is a black-box system. In particular, we refer to the mapping between two stimulation parameters (channel and amplitude) and measures extracted from the EMG signals evoked by monopolar single pulse or burst stimulation. The algorithm models this system as a multi-output GP, identified by a mean m(x) and a kernel K(*x,x*') function. We define a scalar objective as a function of the EMG-based measures that has to be optimized to elicit a specific movement. An iterative procedure is undertaken to rapidly reach the objective optimum. At each iteration a new response of the system is observed, i.e. a stimulus is applied from the selected channel with the selected amplitude value and the evoked EMG-based measures are computed. The model is updated depending on the observations made so far. The UCB acquisition function is finally applied to identify the next query. The procedure continues until convergence to the optimal stimulation parameters. (b) Example of model update after an increasing number of algorithm iterations, indicated as t. Each point in the parameters space is assigned a value of objective mean and std that are updated according to the observations made so far.
Download figure:
Standard image High-resolution image

2.3.2. Objective function

Unlike previous studies [34, 35], here we tested BO to solve a multi-attribute neurostimulation optimization problem. In particular, we characterized and optimized the motor responses evoked by intraneural PNS using features extracted from the EMG signals of the $M$ implanted muscles $\left( {\left[ {{\mu _1},\, \ldots ,\,{\mu _M}} \right]} \right)$ . For single pulse stimulation (rat experiments), we computed the peak-to-peak of the CMAP, identified as the segment of the EMG signal from 2 to 14 ms after a stimulation pulse. Values obtained for each muscle were then normalized to the maximum found during the experiment. For burst stimulation (monkey experiments), we calculated the average amplitude of the normalized EMG envelope. Specifically, raw EMG data were band-pass filtered between 50 and 500 Hz (Butterworth, 3rd order). Stimulation artifacts were removed using a 3rd order Savitzky–Golay filter with a smoothing window of 2.5 ms. The linear envelope was then computed by rectification and low-pass filtering at 6 Hz (Butterworth, 3rd order). The signals of each muscle were normalized with respect to (a) the maximal activity obtained during the whole experimental session in the case of the offline tests, (b) the maximal activity obtained during the training phase for the online test. Finally, we calculated the average amplitude of the resulting normalized envelope.

To effectively evoke a specific motor function with neurostimulation, we have to search for parameters that strike an appropriate balance between the response of muscles that promote the movement and muscles that impede it. Indeed, because functionally distinct motoneurons are positioned in close proximity within neural structures, a change of stimulating channel or an increase in pulse amplitude may result in many muscles being recruited together in an unselective manner. We solved this multi-objective optimization problem using the scalarization approach, i.e. we aggregated the EMG-based output variables into a single scalar to optimize in our search. We defined two different objective (or reward) functions ${\text{obj}}\left( {\boldsymbol{{x}}} \right)$ for the offline and online algorithm tests. During the offline tests, the search objective was to maximize the activity of the muscles considered functional for the target movement while minimizing the activity of the undesired muscles. In this way, we aimed to selectively evoke the target motor function and at the same time maximize its strength. We linearized this objective by expressing it as:

$\begin{equation*}\mathop {{\text{argmax}}}\limits_{{\boldsymbol{{x}}} \in D} {\text{ obj}}\left( {\boldsymbol{{x}}} \right) = \mathop {{\text{argmax}}}\limits_{{\boldsymbol{{x}}} \in D} \sum\limits_{i = 1}^M {w_i}{\mu _i}\left( {\boldsymbol{{x}}} \right)\end{equation*}$

where ${\boldsymbol{{x}}} = \left[ {{x_1},{x_2}} \right]$ (1 = channel, 2 = amplitude), $D$ is the decision set, ${\mu _i}$ is the EMG response of the $i{\text{th}}$ muscle $(0 \le {\mu _i} \le 1)$ , ${w_i} = 1$ if the $i{\text{th}}$ muscle is functional for the target movement and ${w_i} = - 1$ if the $i{\text{th}}$ muscle is nonfunctional for the movement. We chose to use $\left| {{w_i}} \right| = 1\,\forall \,i$ (all the muscles equally contribute to the multi-attribute optimization) to make the procedure as generalizable as possible. Based on their physiological function, we identified which of the implanted muscles were functional for each target movement (supplementary table 3). With this definition of objective, the ideal EMG pattern for a given target movement consists in ${\mu _i} = 1$ for all functional muscles, and ${\mu _i} = 0$ for all nonfunctional muscles. For the online test, we defined the reward function as the Euclidean distance between the evoked and a predefined target EMG pattern:

$\begin{equation*}\mathop {{\text{argmin}}}\limits_{{\boldsymbol{{x}}} \in D} {\text{ obj}}\left( {\boldsymbol{{x}}} \right) = \mathop {{\text{argmin}}}\limits_{{\boldsymbol{{x}}} \in D} \parallel {\boldsymbol{{\mu }}}\left( {\boldsymbol{{x}}} \right) - {{\boldsymbol{{\mu }}}_{{\text{target}}}}\parallel .\end{equation*}$

We used as target the stimulation-evoked EMG activity obtained in another animal (Mk-Me, figure 6(b)), manually selecting trials in which a visually adequate movement was generated. This target implant was particularly successful in terms of selectivity, as it produced several different movements. We chose this objective relying on the reproducibility of muscle recruitment between implants.

2.3.3. Probabilistic model

Before defining the model prior, we rearranged the 2D input space denoted by channel and amplitude to facilitate the algorithm in input–output mapping. Channels were represented through a distance-based map (figures 1(b) and (d)) to encode the similarity of muscle recruitment based on spatial proximity. For offline testing, we also adjusted the amplitude dimension. Because the amplitude functional range was quite different from one channel to another and the pulse-width was either 40 or 80 us, we normalized the amplitude values tested for each active site between 0 and 1. Channel impedance variability could otherwise have been inadvertently encoded in our model.

The probabilistic model linking the input space thus defined to the multiple EMG-based output variables was assumed to be a multi-output GP. GPs allow to intuitively exploit the a priori knowledge of input–output mapping through the selection of a mean, a covariance function and their hyperparameters [39]. While GPs were originally employed for single-output scenarios, they then found application in various fields for multi-output (or multi-task) learning problems [40, 41]. Assuming that the outputs are correlated (in our case some muscles are expected to have related responses to stimulation), multi-task learning is based on the idea that by exploiting the information shared between the tasks, we can improve the prediction compared to modelling the tasks individually. The key is then to develop admissible multi-output covariances that capture not only the similarity between the inputs but also the interaction among the outputs. In this sense, several solutions have been proposed [40, 41].

Here, we compared three different multi-output models. First, we threated the multiple outputs as independent GPs with shared hyperparameters [42]. We will hereafter refer to this model as IGP. By imposing the same hyperparameters some correlation between the outputs is still implicitly assumed to exist. Then, we implemented two widely used approaches that explicitly model the correlation between the outputs. They both belong to the group of separable models, i.e. the multi-output covariance is decomposed into the sum of products between a kernel over the input space and a kernel for the outputs (inputs and outputs are threated separately). Specifically, the second model we used is a particular case of the linear model of coregionalization (LMC), called semiparametric latent factor model (SLFM) [43]. Briefly, the LMC expresses the $M$ outputs as a linear combination of $Q$ independent latent functions: ${f_m}\left( {\boldsymbol{{x}}} \right) = \,\mathop \sum \limits_{q = 1}^Q {a_{m,q}}{u_q}\left( {\boldsymbol{{x}}} \right)$ , Here, we chose $Q = M$ . The multi-output covariance in this model can be expressed as ${K_M}\left( {{\boldsymbol{{x}}},{\boldsymbol{{x}}}^{\prime}} \right) = \mathop \sum \limits_{q = 1}^Q {A_q}{k_q}\left( {{\boldsymbol{{x}}},{{\boldsymbol{{x}}}^{\prime}}} \right)$ , where ${A_q}$ is the correlation (or coregionalization) matrix and ${k_q}\left( {{\boldsymbol{{x}}},{{\boldsymbol{{x}}}^{\prime}}} \right)$ is the covariance of the $q{\text{th}}$ latent function. The LMC-type SLFM defines the coregionalization matrix as: ${A_q} = {{\boldsymbol{{a}}}_q}{\boldsymbol{{a}}}_q^T$ . Finally, we tested a particular case of the intrinsic coregionalization model (ICM), called multi-task GP (MTGP) model [44]. The ICM is a simplified version of the LMC as it employs only a single latent function. The ICM-type MTGP parameterizes ${A_1}$ based on the Cholesky decomposition ${A_1} = L{L^T}$ . We compared the IGP, MTGP e SLFM models in our offline tests, while for the online test we solely used the IGP approach.

As for the input covariances, in all the three scenarios we used a composite kernel formed as the product of two standard Matérn 52 [39], one in the channel space and the other in the amplitude space. A standard Matérn 52 is a stationary distance-based kernel defined as:

$\begin{align} k\left( {{\boldsymbol{{x}}},{\boldsymbol{{x^{\prime}}}}} \right) &= {\sigma ^{\,2}}\left( {1 + \sqrt 5 r + 5/3{r^{\,2}}} \right){e^{ - \sqrt 5 r}},\,\,\\ r &= \,\sqrt {{{\left( {{\boldsymbol{{x}}} - {\boldsymbol{{x^{\prime}}}}} \right)}^T}{{{\Sigma }}^{ - 1}}\left( {{\boldsymbol{{x}}} - {\boldsymbol{{x^{\prime}}}}} \right)},\\ {{\Sigma }} &= {\text{diag}}\left( {l_1^{\,2}, \ldots l_n^{\,2}} \right) \end{align}$

with $n$ being the dimension of ${\boldsymbol{{x}}}$ (here $n = 2$ ). The Matérn class was chosen because it can depict rough behaviors [39]. The product of the two Matérn along the two dimensions implies that two stimuli ${\boldsymbol{{x}}}$ and ${\boldsymbol{{x^{\prime}}}}$ elicit a similar muscle response only if both their channels and amplitude values are close. Regarding the mean function of the multi-output GP, we used a constant mean for the offline tests and a zero mean for the online test.

The hyperparameters of the models tested (such as the constant $c$ for the mean function, the length-scales ${\boldsymbol{{l}}} = \left[ {{l_{C1}},{l_{C2}},{l_{A1}},{l_{A2}}} \right]$ and the signal variance ${\sigma ^{\,2}}$ for the input covariances, and the parameters of the coregionalization matrix for the MTGP and SLFM models) were tuned on each animal training set (see section 2.3.6): we found the optimal hyperparameters to maximize the marginal likelihood of the training data using the conjugate gradient method [45]. The values obtained for the IGP model are shown in supplementary table 4 for reference.

2.3.4. Model prediction

As mentioned in section 2.3.1, BO updates the probabilistic model at each iteration by conditioning the prior distribution on previous observations and thus makes predictions about future responses. In the case of multi-output GP models, if we condition the prior ${\boldsymbol{{f}}}\left( {\boldsymbol{{x}}} \right)$ on a finite number of responses ${{\boldsymbol{{y}}}_{t - 1}} = \,{\left[ {{{\boldsymbol{{y}}}_1}, \ldots ,{{\boldsymbol{{y}}}_{t - 1}}} \right]^T}$ corresponding to actions ${{\boldsymbol{{X}}}_{t - 1}} = \,{\left[ {{{\boldsymbol{{x}}}_1}, \ldots ,{{\boldsymbol{{x}}}_{t - 1}}} \right]^T}$ and we assume that the noise is i.i.d. Gaussian, we get a Gaussian posterior distribution [41, 46]:

$\begin{equation*}{\boldsymbol{{f}}}\left( {\boldsymbol{{x}}} \right)\,|\,{{\boldsymbol{{y}}}_{t - 1}}\,\sim \,\mathcal{N}\left( {{{\boldsymbol{{m}}}_{t - 1}}\left( {\boldsymbol{{x}}} \right),{{\mathop \sum \nolimits }_{t - 1}}\left( {{\boldsymbol{{x}}},{\boldsymbol{{x}}}} \right)} \right)\end{equation*}$

where

$\begin{equation*}{{\boldsymbol{{m}}}_{t - 1}} = { }{K_M}{\left( {{\boldsymbol{{x}}},{{\boldsymbol{{X}}}_{t - 1}}} \right)^T}{\left[ {{K_M}\left( {{{\boldsymbol{{X}}}_{t - 1}},{{\boldsymbol{{X}}}_{t - 1}}} \right) + {{\mathop \sum \nolimits _M}}} \right]^{ - 1}}{{\boldsymbol{{y}}}_{t - 1}},\end{equation*}$

$\begin{align*}{\mathop \sum \nolimits _{t-1}}\left( {{\boldsymbol{{x}}},{\boldsymbol{{x}}}} \right) &= { }{K_M}\left( {{\boldsymbol{{x}}},{\boldsymbol{{x}}}} \right) - { }{K_M}{\left( {{\boldsymbol{{x}}},{\boldsymbol{{\,}}}{{\boldsymbol{{X}}}_{t - 1}}} \right)^T}\\[9pt] &\quad\times{\left[ {{K_M}\left( {{{\boldsymbol{{X}}}_{t - 1}},{\boldsymbol{{\,}}}{{\boldsymbol{{X}}}_{t - 1}}} \right) + {{\mathop \sum \nolimits _M}}} \right]^{ - 1}}\\[9pt] &\quad\times{K_M}{\left( {{\boldsymbol{{x}}},{\boldsymbol{{\,}}}{{\boldsymbol{{X}}}_{t - 1}}} \right)^T},\end{align*}$

where ${\mathop \sum \nolimits _M} = {\mathop \sum \nolimits _s} \otimes {I_{t - 1}}$ is a diagonal noise matrix, whose elements are additional hyperparameters that we optimized based on each animal training set (supplementary table 4). Given this posterior, all the points ${\boldsymbol{{x}}} \in D$ ( $D\,$ decision set) are assigned a mean vector and a covariance matrix.

2.3.5. Acquisition function

For our offline tests, we used UCB-like acquisition functions [32] and determined the next observation ${{\boldsymbol{{x}}}_t}$ as:

$\begin{equation*}{{\boldsymbol{{x}}}_t} = \mathop {{\text{argmax }}}\limits_{{\boldsymbol{{x}}} \in D} \left( {m_{t - 1}^{{\text{obj}}}\left( {\boldsymbol{{x}}} \right) + { }k\sigma _{t - 1}^{{\text{obj}}}\left( {\boldsymbol{{x}}} \right)} \right)\end{equation*}$

being $m_{t - 1}^{{\text{obj}}}\left( {\boldsymbol{{x}}} \right)$ and $\sigma _{t - 1}^{{\text{obj}}}\left( {\boldsymbol{{x}}} \right)$ the mean and standard deviation (std) of the posterior distribution of the objective function ${\text{obj}}\left( {\boldsymbol{{x}}} \right)$ at action $t - 1$ . The first term of the equation is the exploitation rule: it tends to regions where the expected objective is maximal based on the current data. The second term of the equation is the exploration rule: the search is guided towards high-uncertainty regions. The two processes must be balanced to avoid wasting time and getting stuck in local optima. This balance is determined by the parameter $k\,(k > 0)$ . We tuned $k$ to get a final minimum regret (see section 2.4.1) reasonably low and approximately equal across the different models, while having a good convergence rate. The chosen values are shown in supplementary table 4. For the IGP approach, since ${\text{obj}}\left( {\boldsymbol{{x}}} \right)$ was built as a weighted linear combination of independent GPs modeling the EMG responses (see section 2.3.2), its posterior mean and std were computed as $m_{t - 1}^{{\text{obj}}}\left( {\boldsymbol{{x}}} \right) = \sum\limits_{i = 1}^M {w_i}m_{t - 1}^i\left( {\boldsymbol{{x}}} \right)$ , $\sigma _{t - 1}^{{\text{obj}}}\left( {\boldsymbol{{x}}} \right) = \sqrt {\sum\limits_{i = 1}^M {{({w_i}\sigma _{t - 1}^i\left( {\boldsymbol{{x}}} \right))}^{\,2}}}$ , being $m_{t - 1}^i\left( {\boldsymbol{{x}}} \right)$ and $\sigma _{t - 1}^i\left( {\boldsymbol{{x}}} \right)$ the posterior mean and std of the $i{\text{th}}$ EMG response. In the case of correlated outputs, i.e. for the MTGP and SLFM models, the objective posterior mean and std were calculated as $m_{t - 1}^{{\text{obj}}}\left( {\boldsymbol{{x}}} \right) = \sum\limits_{i = 1}^M {w_i}m_{t - 1}^i\left( {\boldsymbol{{x}}} \right)$ , $\sigma _{t - 1}^{{\text{obj}}}\left( {\boldsymbol{{x}}} \right) = \sqrt {\sum\limits_{i = 1}^M {{({w_i}\sigma _{t - 1}^i\left( {\boldsymbol{{x}}} \right))}^{\,2}} + 2\sum\limits_{i = 1}^M \sum\limits_{j = i + 1}^M {w_i}{w_j}co{v_{i,j}}\left( {\boldsymbol{{x}}} \right)}$ .

For the online test, we used a readapted version of the UCB decision rule, because the objective was defined as a nonlinear function of independent GPs (see section 2.3.2). Specifically, the next query ${{\boldsymbol{{x}}}_t}$ was determined as:

$\begin{align*}{{\boldsymbol{{x}}}_t} = \mathop {{\text{argmax }}}\limits_{{\boldsymbol{{x}}} \in D} &\left( - \sqrt {\sum\limits_{i = 1}^M {{\left( {m_{t - 1}^i\left( {\boldsymbol{{x}}} \right) - \mu _{{\text{target}}}^i} \right)}^{\,2}}}\right.\\[9pt] &\quad\left. + k\sqrt {\sum\limits_{i = 1}^M {{(\sigma _{t - 1}^i\left( {\boldsymbol{{x}}} \right))}^{\,2}}} \right).\end{align*}$

In this case, we used $k = 3$ .

2.3.6. Training and test set

Each dataset employed for offline testing was split into a training set, which was used to optimize the model hyperparameters, and a test set, which could be sampled during the algorithm search. The test set was constructed to contain a maximum of two repetitions for each tested combination of channel and amplitude. If more than two repetitions were present, redundant stimuli were added to the training set. For the online test, the dataset acquired during the preliminary phase was used as the training set. The dimension of the final sets and the number of unique stimuli are specified in supplementary tables 1 and 2.

2.3.7. Algorithm implementation

The different versions of the multi-output GP-BO algorithm were implemented in MATLAB (MathWorks, Natick MA) using custom code, with the aid of the GPML toolbox [45] to create the input covariance and learn the model hyperparameters (using the minimize function). A custom MATLAB graphical user interface (supplementary figure 1) was designed for the online test to drive the communication between the stimulator that sent the electrical pulses, the processor that recorded the EMG signals, and the algorithm code.

2.4. Data analysis

2.4.1. Performance measures

In the presence of ground truth data, as in our offline tests, BO performance can be quantified using the concept of regret [47]. The instantaneous regret at action ${{\boldsymbol{{x}}}_t}\,$ is defined as $r\left( {{{\boldsymbol{{x}}}_t}} \right) = {\text{ obj}}\left( {{{\boldsymbol{{x}}}^{\boldsymbol{{*}}}}} \right) - {\text{obj}}\left( {{{\boldsymbol{{x}}}_t}} \right)$ , where ${{\boldsymbol{{x}}}^{\boldsymbol{{*}}}} = \mathop {{\text{argmax}}}\limits_{{\boldsymbol{{x}}} \in D} {\text{ obj}}\left( {\boldsymbol{{x}}} \right)$ is the optimal solution. The cumulative regret ${R_T} = \,\mathop \sum \limits_{t = 1}^T r\left( {{{\boldsymbol{{x}}}_t}} \right)$ is the sum of instantaneous regrets accumulated over $T$ queries. Based on these definitions, the performance of the algorithm was expressed through the action-by-action evolution of two derived metrics: (a) minimum regret $M{R_T} = \mathop {\min }\limits_{t \in T} r\left( {{{\boldsymbol{{x}}}_t}} \right)$ and (b) time-average regret $A{R_T} = {R_T}/T$ . The minimum regret expresses the accuracy of the algorithm in the sense of its ability to find high-reward stimuli, but without evaluating all the observations made. Conversely, the average time regret considers how many of the visited stimuli are successful, and thus describes the actual balance between exploration and exploitation. We also assessed the number of actions required for convergence (detected when the same stimulus was observed at least five times in a row) and how many unique parameter combinations were tested among them (hereafter referred to as unique actions). We compared these values with the size of the decision set, i.e. the number of available unique channel-amplitude pairs.

For the online test, since the optimal solution was unknown, we evaluated the functioning of the algorithm by assessing the dynamics of the minimum and average objective observed so far.

2.4.2. Kinematics and grip force analysis

To quantify the movements evoked by the stimulation parameters selected at algorithm convergence, we analyzed the relative kinematics. Hindlimb joints in rats and hand joints in monkeys were manually selected on 2D video frames corresponding to rest and maximal range of movement, using a custom MATLAB routine. 2D kinematic features were then computed as described in supplementary table 5. The values at rest were subtracted from the values at the maximal range of movement.

Grip force was also measured for few stimuli identified at convergence in monkeys. Voltage values were converted in N and normalized with respect to the maximum voluntary contraction (MVC) of Mk-Ol_c squeezing a spherical object [28].

2.4.3. Statistics

Data are reported as mean ± std or mean ± standard error of the mean (s.e.m). The choice is specified in figure captions. Significance was evaluated using the non-parametric Kruskal–Wallis test with Bonferroni correction for multiple comparisons.

3. Results

3.1. Offline optimization of sciatic nerve stimulation in rats

Initially, we tested offline the performance of the different versions of our GP-BO algorithm in optimizing lower extremity PNS in rodents to evoke ankle movements. To this purpose, we employed data acquired during an experiment of single pulse stimulation of the sciatic nerve in n = 3 rats (see section 2.1.2). An online application of the algorithm was mimicked by sampling these datasets: at each iteration, the algorithm could select among the available stimuli and observe the corresponding elicited motor response. The aim was to identify optimal channel-amplitude pairs to evoke two target motor functions, namely ankle flexion and extension, associated to the common peroneal and tibial branches of the sciatic nerve, respectively [48]. The optimization objective was to maximally selectively activate TA for ankle flexion, and the synergy of SOL, GM, and PL for ankle extension.

For both target motor functions, using each of the three multi-output models, we executed 50 independent algorithmic runs starting from random points in the input space. A representative run is illustrated in supplementary video 1. The performance obtained on average across the 50 runs is shown in figures 3(a) and (b). MTGP and SLFM outperform IGP in optimizing ankle flexion in rat 1 and in rat 3 and ankle extension in rat 2. Indeed, their time-average regret is inferior (figure 3(a)), indicating a more effective search, and they converge in a significantly lower number of iterations (figure 3(b)). Instead, MTGP and SLFM necessitate of more queries than IGP for ankle flexion in rat 2 and ankle extension in rat 3 (figure 3(b)). However, in both cases, the minimum and time-average regrets of the three models roughly coincide (figure 3(a)), so their functioning can be considered equivalent. From a general perspective, the number of iterations required by the algorithm to converge is small compared to the size of the decision set in all rats, regardless of the model used (figure 3(b)). In percentage of the search space size, the number of total observations is (21 ± 13)%, (20 ± 12)%, (20 ± 12)% and the number of unique actions is (15 ± 8)%, (13 ± 8)%, (13 ± 8)% using IGP, MTGP and SLFM respectively, as the mean ± std across all animals, targets and runs. This result shows the efficiency of our BO method in comparison with exhaustive testing. As for the accuracy, the minimum regret at convergence is 0.10 ± 0.18, 0.04 ± 0.10, 0.05 ± 0.11 using IGP, MTGP and SLFM respectively, on average across all animals, targets and runs (figure 3(a)). Therefore, although the convergence stimuli do not match the best available solution in some cases, they are generally close to it. Taken together, these results suggest that the algorithm successfully balances exploration and exploitation to rapidly reach high-objective regions. Moreover, overall performance improves when the correlation between the outputs is explicitly modeled, almost equally with MTGP and SLFM type covariance structures.

**Figure 3.** GP-BO algorithm offline performance when optimizing sciatic nerve stimulation in rats. (a) Action-by-action evolution of the minimum and average time regret when optimizing sciatic nerve stimulation to evoke ankle flexion and extension in n = 3 rats. The three multi-output models (IGP, MTGP, and SLFM) are compared. (b) Actions (total and unique) necessary for the algorithm to converge in the three rats, when the three models are used. The size of each decision set is also shown. (c) EMG patterns, expressed as normalized peak-to-peak values of CMAPs, generated by the channel-amplitude pair found at convergence using the IGP model in the three rats. The ideal EMG pattern, as stated by the objective function (i.e. to maximize the difference between functional and nonfunctional muscles, see section 2.3.2 and supplementary table 3), is also shown. (d) Average ankle angle characterizing the movements at convergence. Data are reported as (i) mean ± std across 50 algorithmic runs with random initializations in panels a and b, (ii) mean ± s.e.m across different repetitions of the same stimulus in panels c and d. *p < 0.05, **p < 0.01, ***p < 0.001, non-parametric Kruskal–Wallis test. Abbreviations: tibialis anterior (TA), soleus (SOL), gastrocnemius medialis (GM), plantaris (PL).
Download figure:
Standard image High-resolution image

To verify whether the identified stimuli were effective in producing the target movement, we evaluated the motor patterns evoked by the channel-amplitude pair that the IGP-based algorithm selected most frequently at convergence across the 50 runs (supplementary figure 3). We note that IGP represents the worst-case scenario because on average it gave the highest final minimum regret. Flexion and extension of the ankle were effectively attained in all the animals when targeted (supplementary video 1). The EMG measures corresponding to these movements were consistent with the intended activity, with predominant activation of TA for ankle flexion and of SOL, GM and PL for ankle extension (figure 3(c)). The different distances between these EMG patterns and the ideal patterns reflect the different degrees of selectivity achieved with the three implants. We also found that the ankle joint angle induced by the convergence stimulus was similar across the three rats for the same target (figure 3(d)), although no constraints were placed on kinematics. We can conclude that effective stimuli were found in all cases and that the chosen objective function adequately represented the desired movements.

3.2. Offline optimization of median and radial nerve stimulation in monkeys

We then applied our GP-BO algorithm offline to optimize upper extremity PNS in monkeys to elicit varied movements of the hand. Data acquired during an experiment of burst stimulation of arm nerves in n = 4 monkeys were used for the test (see section 2.2.3). We specifically targeted the median and radial nerves, as they provide innervation to most of the flexor and extensor muscles of the hand, respectively [28]. As for sciatic nerve stimulation, we sampled these datasets to simulate an online application of the GP-BO algorithm. We searched for the optimal channel-amplitude combination for various desired motor outputs. Five flexion movements were targeted with median nerve stimulation: flexion of the thumb, flexion of the wrist and three types of grip, namely (a) cylindrical grip, or hook grip, characterized by the flexion of all the digits except the thumb, (b) pinch-like grip, or tripod grip, in which the thumb, index and middle finger are flexed to form a tripod and (c) spherical grip, or whole handgrip, involving the flexion of all the digits [49]. The search objective was to maximally selectively activate THE to produce thumb flexion, FCR together with PL to flex the wrist, FDS along with FDC to flex the four digits in a cylindrical grip, THE and FDI to oppose the thumb against the index and middle finger in a pinch-like grip, and finally FDS, FDC and THE to close the hand in a spherical power grip. For radial nerve stimulation, we identified three extension movements to be elicited: ulnar deviation of the wrist, radial extension of the wrist and extension of the fingers (hand opening). We aimed to maximally selectively recruit ECU to deflect the wrist ulnarly, ECR to extend the wrist radially, and finally EDC along with APL to extend the fingers.

Figures 4(a), (b), 5(a), (b) and supplementary figure 2 show the average performance of the algorithm over 50 independent runs with random initializations, with a comparison of the three multi-output models. For each monkey, we report only the target movements that were elicited by at least one of the tested parameter combinations (supplementary table 2). Averaged across all monkeys, targets and runs, the minimum regret at convergence is equal to 0.04 ± 0.08, 0.03 ± 0.08, 0.03 ± 0.07 using IGP, MTGP, and SLFM, respectively. To reach these regret values, the algorithm requires an average number of observations, expressed as a percentage of the decision set size, equal to (23 ± 23)%, (27 ± 28)%, (20 ± 23)% using the three models. Meanwhile, the unique actions are (20 ± 19)%, (21 ± 22)%, (16 ± 15)%. Thus, we can observe good performance on average, in terms of both accuracy and efficiency, with all the three models, and the best results are obtained using SLFM. However, there is large variability depending on both the dataset and the target. We have that SLFM, and also MTGP for five targets out of eight, outperforms IGP in tuning median nerve stimulation in Mk-Ol_a and Mk-Pa, and radial nerve stimulation in Mk-Ol_c. Indeed, their time average regret remains inferior (figure 5(a), supplementary figure 2) and the queries required for convergence are significantly lower (figures 4(b) and 5(b)). Conversely, especially MTGP but also SLFM works worse than IGP in Mk-Me (figures 4(a) and (b)) and Mk-Ol_a radial nerve (figure 5(b), supplementary figure 2). Finally, when optimizing median nerve stimulation in Mk-Ol_c, the three models are approximately equivalent (figure 4(b), supplementary figure 2). We note that the efficiency of the algorithm becomes more evident for larger and more complete datasets such as Mk-Pa (see section 2.2.3) (figure 4(b)). On the contrary, the lowest convergence rate is obtained for some targets in the case of smaller and less exhaustive datasets, such as cylindrical grip in Mk-Me (figure 4(b)) or fingers extension in Mk-Ol_a (figure 5(b)).

**Figure 4.** GP-BO algorithm offline performance when optimizing median nerve stimulation in monkeys. (a) Action-by-action evolution of the minimum and average time regret when optimizing median nerve stimulation to evoke different flexion movements in one monkey (Mk-Me). The three multi-output models (IGP, MTGP, and SLFM) are compared. (b) Actions (total and unique) necessary for the algorithm to converge in n = 4 monkeys, when the three models are used. The size of each decision set is also shown. (c) Movements elicited at by the IGP-based algorithm at convergence for different flexion targets in Mk-Me, with the corresponding muscles temporal activation. (d) EMG patterns, expressed as average amplitude values of normalized envelopes, generated by convergence stimuli for different flexion targets in the four monkeys. The ideal EMG pattern, as stated by the objective function (i.e. to maximize the difference between functional and nonfunctional muscles, see section 2.3.2 and supplementary table 3), is also shown. Bottom right-hand corner: same EMG patterns represented in the 2D space defined by the first two principal components of the EMG data. Ellipsoids indicate the 95% confidence region of data related to the same movement type. Thumb flexion is not shown because it was achieved in one animal only. (e) Kinematic features measured for the three types of grip (cylindrical, pinch, spherical) characterizing the movements at convergence in the four monkeys. Data are reported as (i) mean ± std across 50 algorithmic runs with random initializations in panels a and b, (ii) mean ± s.e.m across different repetitions of the same stimulus in panels c, d, e. *p < 0.05, **p < 0.01, ***p < 0.001, non-parametric Kruskal–Wallis test. Abbreviations: flexor carpi radialis (FCR), palmaris longus (PL), flexor digitorum profundus (FDP), flexor digitorum superficialis (FDS), first dorsal interosseous (FDI), thenar eminence (THE), proximal interphalangeal (PIP).
Download figure:
Standard image High-resolution image

**Figure 5.** GP-BO algorithm offline performance when optimizing radial nerve stimulation in monkeys. (a) Action-by-action evolution of the minimum and average time regret when optimizing radial nerve stimulation to evoke different extension movements in one monkey (Mk-Ol_c). The three multi-output models (IGP, MTGP, and SLFM) are compared. (b) Actions (total and unique) necessary for the algorithm to converge in n = 2 monkeys, when the three models are used. The size of each decision set is also shown. (c) Movements elicited by the IGP-based algorithm at convergence for different extension targets in Mk-Ol_c, with the corresponding muscles temporal activation. (d) EMG patterns, expressed as average amplitude values of normalized envelopes, generated by convergence stimuli for three extension targets in the two monkeys. The ideal EMG pattern, as stated by the objective function (i.e. to maximize the difference between functional and nonfunctional muscles, see section 2.3.2 and supplementary table 3), is also shown. (e) Kinematic features characterizing the movements at convergence in the two monkeys. Data are reported as (i) mean ± std across 50 algorithmic runs with random initializations in panels a and b, (ii) mean ± s.e.m across different repetitions of the same stimulus in panels c, d, e. *p < 0.05, **p < 0.01, ***p < 0.001, non-parametric Kruskal–Wallis test. Abbreviations: extensor carpi radialis (ECR), extensor digitorum communis (EDC), extensor carpi ulnaris (ECU), abductor pollicis longus (APL), proximal interphalangeal (PIP), metacarpal interphalangeal (MCP).
Download figure:
Standard image High-resolution image

To assess the efficacy of the selected stimuli in triggering the target gestures, we analyzed the motor patterns elicited by the stimulation parameters that the IGP-based algorithm identified most frequently at convergence during the 50 runs (supplementary figure 3). These movements were visually consistent with the desired motor outputs (figures 4(c), 5(c) and supplementary video 2). Corresponding EMG patterns were characterized by predominant activation of the majority of the muscles that we aimed to selectively recruit (figures 4(d) and 5(d)). By and large, the EMG activity was homogeneous across animals, resulting in a clustering of gestures of the same type in the space defined by the first two principal components (figure 4(d)). Observed differences in muscle recruitment across individuals were due to the inter-animal variability in Mk-TIME position and fascicular topography. These implant-specific properties influenced the maximum functional selectivity achieved under the stimuli tested, and thus the extent of the deviation between the objective optimum and the ideal EMG pattern. Although this deviation was not minor in some cases, the obtained EMG patterns always elicited the target motor function. This indicates that the balance between the activations of functional and nonfunctional muscles was still adequate. Regarding the movement strength, with the few convergence stimuli for which grasp force was measured, we achieved values around 6%–12% MVC for the precision pinch-like grip and higher values for the power grips, i.e. 31% MVC for the cylindrical grip and 52% MVC for the spherical grip (supplementary table 6). Despite their variability, these force values are more than sufficient for performing functional grasping tasks [50]. As for kinematics, finger and wrist angles differed between various movements within the same animal (figures 4(e) and 5(e)). Specifically, the spherical grip was characterized by larger thumb flexion compared to the cylindrical grip and larger index and middle finger flexion compared to the pinch-like grip. At the same time, the pinch-like grip exhibited greater index finger abduction compared to the cylindrical grip. The wrist extension angle was larger in the radial extension target than in the ulnar deviation target, and vice versa for the ulnar deviation angle. Finally, the index and middle finger extension angle was greater for the finger extension target than for the radial extension target. However, the actual values of the kinematic features underlying a specific movement were not exactly the same in the different animals. These results indicate that our algorithm can effectively optimize stimuli for specific hand gestures and that the chosen objective function is appropriate, although further improvements could be achieved by constraining kinematics and force in addition to EMG activity (see section 4).

3.3. Online optimization of median nerve stimulation in a monkey

Finally, we performed a preliminary intraoperative test of the IGP-based algorithm to optimize burst stimulation in a monkey acutely implanted with one Mk-TIME in the median nerve. Through a preliminary test of the stimulation channels, we restricted the decision set for the algorithm application to functional channel-amplitude pairs (see section 2.2.4), yielding a search space of 259 parameter combinations. We aimed at finding the optimal stimulation parameters within this space to generate two different flexion gestures, namely spherical and pinch-like grips. To elicit these two movements, the objective in this case was to minimize the distance between the generated EMG pattern and a target EMG pattern (see section 2.3.2). Two independent algorithmic runs were performed for the two grip types, each stopped after 60 iterations. Since the Mk-TIME had just been implanted and thus the muscle recruitment was completely unknown, we wanted to ensure that enough points were visited in the limited time available for the experiment to gain sufficient insight into the system. Therefore, we set a repetition limit: the same channel-amplitude pair could be tested at most three times in each run.

Figure 6(a) shows the action-by-action evolution of the minimum and average objective for the two targets. The average objective kept decreasing during the search, demonstrating that the algorithm was able to effectively trade off exploration and exploitation. Of the 60 channel-amplitude pairs tested, 57 were unique for the pinch-like grip and 39 for the spherical grip. Therefore, the algorithm spent more time exploring the search space for the pinch-like grip, while it spent more time exploiting the high-reward regions identified for the spherical grip. At actions 42 and 56 of the spherical grip search, two parameter combinations characterized by the same channel (the one that turned out to be optimal) and two consecutive amplitude values were selected three times in a row. Through offline post-processing, we found that the algorithm would have selected the same stimuli the fourth time, but was forced to continue the exploration by the repetition limit. Similarly, at action 50 of the pinch-like grip search, the channel-amplitude pair that turned out to be optimal (and that had previously been visited once) was selected twice in succession. Without the repetition limit, it would have been visited again. This suggests that convergence would likely have been achieved if there had been no constraints.

The EMG patterns elicited by the best channel-amplitude pair within the 60 actions were close to the target pattern for both the spherical and pinch-like grips (figure 6(b)). In terms of kinematics, the spherical grip was characterized by larger finger flexion compared to the pinch-like grip (figure 6(c)). However, the two movements were found not to be strikingly different from each other (supplementary video 3). Thus, although the algorithm reached solutions that were close to the target, they did not effectively reproduce the desired movements. These results suggest that the chosen objective function could be refined to achieve better performance.

The algorithm running time was short. Approximately 0.2 s was required to process the EMG data recorded at each iteration. The next stimulus was then selected in about 0.1 s. Therefore, when integrated with a powerful data acquisition system, our method could rapidly drive the search for optimal stimulation parameters, without requiring more time than the pauses between stimulation bursts necessary to avoid muscle fatigue.

4. Discussion

In this study, we developed a GP-BO algorithm to efficiently optimize intraneural PNS of the lower and upper limbs to evoke various ankle and hand movements in rats and monkeys, respectively. We constrained the problem to finding the optimal pair of channel and current amplitude during monopolar single pulse or burst stimulation. The aim of the study was to prove the potential of this approach in optimizing functionally meaningful movements, a scenario in which multiple and conflicting motor-related objectives have to be traded off to get the desired outcome. In particular, we designed custom reward functions that scalarize measures extracted from the EMG signals of multiple muscles so to coordinate their activation. For this system, we implemented three multi-output GP models that differently account for the correlation between the outputs, and investigated their functioning. Finally, we defined UCB-like acquisition functions suited for the different models used. We note that, although validated in PNS, the proposed framework can be applied to tune other stimulation modalities, such as spinal or cortical stimulation.

4.1. Algorithm performance

We applied the different versions of our algorithm offline to optimize sciatic nerve stimulation in the rat model and median and radial nerve stimulation in the monkey model. The datasets used were recorded in different animals, reflecting the variability across implants. The algorithm was flexible enough to perform well under the different experimental conditions. Search accuracy was always satisfactory, as low-regret solutions were identified in all cases. Efficiency was high on average for both lower and upper limb PNS tuning, as convergence was reached with many fewer evaluations than extensive search (∼20% of input space). Some monkey tests required more detailed search, and overall we observed greater variability in convergence rate when optimizing hand movements compared to ankle movements. In this respect, we note that the task of tuning median and radial PNS is more challenging than sciatic PNS, due to the higher complexity in fascicular organization and target movement actuation. Indeed, in the rat distal sciatic nerve, two separate fascicles are involved in the two targeted single-joint motor functions, namely the peroneal fascicle in ankle flexion and the tibial fascicle in ankle extension [19, 48]. Conversely, the monkey median and radial nerves are composed of multiple fascicles, each innervating one or more muscles that can be functionally distinct [28]. In addition, upper extremity PNS was tuned to optimize multi-joint movements. Therefore, the input–output relationship was more tortuous. This is confirmed by the fact that while in rats the channels that optimally produced the two target movements were located quite far apart on the electrode shaft, in monkeys active sites closer together were selected as optimal for evoking different motor outputs (supplementary figure 3). Even in the more complex upper limb scenario, the algorithm showed good efficiency by querying less than 50% of the decision set in ∼68% of the tests. Since the worst results were obtained in the smallest and most incomplete monkey datasets (where the search process is likely to be affected by the presence of some critical regions that the algorithm cannot observe), the advantage of GP-BO can only increase in a less constrained online setting. As for the comparison between the three probabilistic models, none of them always outperformed the others. Nevertheless, in rats, the inclusion of output correlations in the covariance structure through both MTGP and SLFM led to higher accuracy and efficiency on average. Since no notable differences were found between the two models, MTGP is preferable due to its lower computational complexity [41]. In monkeys on the other hand, the best average performance was obtained with SLFM. By considering multiple latent functions, this model can better capture output-specific features [41] and is reasonably more appropriate for the more complex upper-limb PNS system. We also point out that predicting the outputs jointly has a greater advantage when they do not share the same inputs (i.e. data are heterotopic) [41] and could thus be even more convenient when EMGs are intermittently unavailable during online use. Finally, regarding the objective function, our definition proved to be successful, as the identified EMG patterns always produced the desired movements. This demonstrates that a multi-muscle optimization is a valid solution when motor functions are targeted.

To show the feasibility of applying our algorithm experimentally, we also conducted a preliminary online test. The algorithm was able to capture the structure of the system and approached high-reward solutions quite quickly. The main difference was that the objective defined for our offline tests (i.e. maximize the activation of functional muscles and at the same time minimize the response of nonfunctional muscles) was high performing, while the objective used for the online test (i.e. minimize the distance from preselected EMG patterns evoked by stimulation in another animal) should be revised to obtain more selective motor patterns. This shows how much the choice of the reward measure is crucial for the outcome of the method. An alternative to creating artificial objective functions would be to target the EMG activity of healthy animals performing behavioral tasks.

4.2. Related work

GP-BO has been used in previous studies for different motor neuroprosthetic applications. In [34], a GP-BO algorithm using batch processing was applied to tune epidural spinal cord stimulation in rats to provide therapy while optimizing the objective function, over multiple days. In [35], hierarchical models were used to optimize multichannel intracortical stimulation protocols in a monkey. In these two studies, which focused on other aspects, the objective was to maximize the EMG activity of a single muscle at a time without considering the actual motor effect elicited. In contrast, our paper examines GP-BO ability to identify stimuli that evoke behavioral movements. To this end, we propose a framework to jointly model multiple EMG objectives, combine them in an adequate reward metric, and customize the acquisition function accordingly. We validate this method in the PNS-to-motor system and prove the robustness and versatility of GP-BO across a variety of animals and implants.

4.3. Limitations and future work

We chose to build our multi-attribute optimization on EMG activity because it was immediately applicable to our setup, and because there is a direct, and thus easy to model, link between stimulus and muscle response. However, to make a search based on EMG signals effective, it is crucial to record the activation of all relevant muscles that might favor or interfere with the desired movement. For example, here we had to remove some stimuli from one of the monkey datasets (see section 2.2.3), because they triggered the pronation of the forearm. This effect could not be accounted for in the optimization because we had not acquired the EMG activity of the muscle responsible for it (pronator teres). Moreover, we could have optimized the motor responses more accurately by recording the activation of more intrinsic muscles of the foot in rats and of the hand in monkeys and including them into the reward metric. In addition, muscle activity cannot be so easily related to the range and strength of movement. Indeed, with our EMG-based reward function the algorithm selected parameters that did evoke the desired motor functions but with variable force extent and joint angles that differed across animals. In view of these considerations, kinematic and kinetic features could be added to the objective function to achieve more accurate motor outputs tailored to specific tasks. Moreover, during online application in the clinical setting, a measure of movement adequacy and functionality, such as the Medical Resource Council scale, could be assigned by the clinician after a visual inspection of the elicited motor response and integrated into the reward policy. This further underlines the advantage of this method that is designed for online use: human expert knowledge could be exploited on-site in a collaborative decision-making process.

We are aware that single pulse stimulation, which we optimized in the rat model, is not the adequate paradigm to produce sustained and thus functional movements. Instead, bursts with a sufficient pulse frequency (i.e. between 30 and 70 Hz [19]) are required. However, because muscle recruitment is analogous in the two settings, our results are directly translatable to the fixed-frequency burst paradigm.

Given the constraints of the available data, our algorithm searched over a rather limited space when applied offline. Even during the online test, the size of the input grid was not very large (i.e. less than 300 channel-amplitude pairs could be sampled). This resulted from the fact that only channel and amplitude were included as optimizable variables in our model. If we were to add only one more stimulation parameter to be optimized (e.g. frequency), which can take N different values, this would result in a decision set N times larger than ours. Even if in a reduced space, the purpose of the study was to show that BO can reduce the experimental time and the burden of manual neurostimulation tuning, with the guarantee that effective stimuli will be identified. We plan to extend the procedure to the setting of additional stimulation parameters besides channel and amplitude. We will then address temporally distributed multichannel stimulation to optimize movement sequences, possibly exploiting hierarchical structures as proposed in [35].

Here we did not address the temporal adaptation of neural stimulation in chronic use, nor did we investigate how to modify the stimulation protocol over time. This is another important aspect that should be explored in future work. The temporal variations of the mapping between stimulation parameters and motor response could be modeled by considering time as an additional dimension of the input space, as in [34], as a way to fully exploit past knowledge, and adjusting the GP prior accordingly. Performance should then be evaluated in chronic interfaces by applying the technique across different experimental sessions.

5. Conclusions

Our results prove that BO is a promising approach to automate the difficult task of optimizing stimulation paradigms for peripheral intraneural stimulation, and potentially for motor neuroprostheses in general, when the aim is to evoke functionally relevant movements identified by multi-objective functions. This technique has the potential to make the process of searching for personalized stimuli fast and effective, which is an essential requirement especially in the clinical setting.

Acknowledgments

The authors would like to thank Ilaria Scarpato, Evgenia Roussinova, and Andrew Bogaard for their help with the experiments and data collection, and Gaia Carparelli for providing useful information on the state of the art on neurostimulation optimization. Funding for this study was provided by the Swiss National Science Foundation grant Dynamo (315230_149902), the Swiss National Science Foundation grant NeuGrasp (205321_170032), the EU project NEBIAS (FP7-611687), a Starting Grant from the European Research Council ('Walk Again', ERC261247), the Wyss Center for Bio and Neuroengineering, the International Foundation for Research in Paraplegia, the Bertarelli Foundation, and an Ambizione Fellowship (167912).

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

Code availability statement

The code used in this study is available at https://github.com/elosanno/PNS-BayesOpt.

Conflict of interest

The authors declare no competing interest.