Decoding hand kinematics from population responses in sensorimotor cortex during grasping

The hand, a complex effector comprising dozens of degrees of freedom of movement, endows us with the ability to flexibly, precisely, and effortlessly interact with objects. The neural signals associated with dexterous hand movements in primary motor cortex (M1) and somatosensory cortex (SC) have received comparatively less attention than have those that are associated with proximal limb control. To fill this gap, we trained three monkeys to grasp objects varying in size, shape and orientation while tracking their hand postures and recording single-unit activity from M1 and SC. We then decoded their hand kinematics across 30 joints from population activity in these areas. We found that we could accurately decode kinematics with a small number of neural signals and that performance was higher for decoding joint angles than joint angular velocities, in contrast to what has been found with proximal limb decoders. We conclude that cortical signals can be used for dexterous hand control in brain machine interface applications and that postural representations in SC may be exploited via intracortical stimulation to close the sensorimotor loop.


INTRODUCTION
The hand, a complex effector comprising dozens of degrees of freedom (Belic & Faisal, 2015), allows us to flexibly, precisely, and effortlessly manipulate objects. The loss of hand function -as a consequence of spinal cord injury, for example -can have devastating consequences on quality of life (Anderson, 2004). In patients whose sensorimotor cortex is intact, some measure of independence can be restored with brain-machine interfaces (BMIs) that tap into the central neural pathways mediating manual dexterity (Bensmaia & Miller, 2014). Translating patterns of neural population activity into outputs of external devices is critical to advance such interfaces.
A critical complement to neural signals involved in controlling movements are signals responsible for conveying sensory feedback about the consequences of those movements (Scott, 2004). Indeed, the motor apparatus receives continuous proprioceptive feedback from joints and muscles that signal the position of body in space, the forces it exerts, and mediates error-corrective motor adjustments (Soechting & Flanders, 1989). Proprioceptive impairments lead to major deficits in motor behavior, leading to slow, effortful, and imprecise movements (Cole & Sedgwick, 1992;Ghez & Salnbosrg, 1995;Sainburg, Ghilardi, Poizner, & Ghez, 1995). Two cortical fields in somatosensory cortex (SC) -Brodmann's areas 3a and 2 (Krubitzer, Huffman, Disbrow, & Recanzone, 2004;Pons & Kaas, 1986) -contain neurons that respond to active and passive manipulation of joints and muscles B. M. London & Miller, 2013;Prud'homme & Kalaska, 1994), as well as forces applied to the joints (Fromm & Evarts, 1982). However, the vast majority of extant studies of proprioceptive representations in SC have focused on the proximal limb, particularly during reaching movements.
Of the existing research on representations of hand movements in SC, two lines of research have investigated the responses of individual SC neurons, one during passive deflections of the hand joints  and the other during active hand movements (Goodman et al., 2019). However, the degree to which hand movements and postures are encoded across populations of SC neurons has not been investigated. One way to address this question is to assess our ability to decode hand kinematics from the responses of populations of SC neurons. Earlier studies have attempted to decode kinematics using SC activity; however, they mainly focused on tracking a finite number of discrete hand configurations (Branco et al., 2017;Farrokhi & Erfanian, 2018) or only considered tasks such as reach-to-grasp that combined simple grasping behavior with comparatively rich reaching kinematics (Carmena et al., 2003;Glaser, Chowdhury, Perich, Miller, & Kording, 2017). To the extent that somatosensory neurons can be established to carry detailed information about hand movements, these neural representations might be exploited to convey artificial proprioceptive feedback through intracortical stimulation (ICMS)(Armenta Salas et al., 2018;Brian M. London, Jordan, Jackson, & Miller, 2008;Tomlinson & Miller, 2016).
The goal of the present study is to assess the degree to which hand kinematics can be decoded from the responses of populations of sensorimotor neurons. To this end, we trained three monkeys to grasp 35 objects of varying sizes, shapes and orientations, while tracking their time-varying hand kinematics using a camera-based motion tracking system and simultaneously recording the responses in the hand representations in M1 and SC using chronically implanted electrode arrays. We show that we can decode the postures and movements of 30 joints of the monkey's hand as it preshapes to grasp each object with as few as 20 neurons in each area. Furthermore, we find that we can better decode posture than movement, in contrast to what has been shown for motor representations of the proximal limb, suggesting possible differences in cortical encoding for proximal and distal limb. Our results underscore the promise of using M1 signals to achieve dexterous control of the hand and demonstrate that SC populations also carry a faithful representation of time-varying hand configuration that might be exploited to restore proprioception through ICMS.

Animals and surgery
We recorded from three male Rhesus macaques ranging in age between 6-15 years and weighing between 8 and 11 kg. All animal procedures were performed in accordance with the rules and regulations of the University of Chicago Animal Care and Use Committee. Monkeys received care from a full-time husbandry staff who maintained a 12hr/12hr light/dark cycle, cleaned the animals' living spaces once a week, and provided the animals with ample food and enrichment. In addition, a full-time veterinary staff monitored the animals' health. The water intake of the animals was regulated according to a protocol requiring monitoring their weights daily and ensuring a minimum daily water consumption of 10 cc/kg. Surgical procedures consisted of implantation of a head-fixing post onto the skull, craniotomy, implantation of a sealed recording chamber (monkeys 2-3), and implantation of chronic Utah electrode arrays (UEAs, Blackrock Microsystems, Inc., Salt Lake City, UT) and floating microelectrode arrays (FMAs, Microprobes for life science, Gaithersburg, MD; monkey 1) or of semi-chronic Microdrive electrode arrays (Gray Matter Research, Bozeman, MT; monkeys 2-3). All procedures were performed under aseptic conditions and under anesthesia induced with ketamine HCl (20 mg/kg, IM) and maintained with isoflurane (10-25 mg/kg per hour, inhaled).

Behavioral task
We trained monkeys to grasp 35 objects varying in shape, size and orientation ( Figure 1A, inset). Throughout the session, the head-fixed monkey sat in a chair facing a three DoF robotic arm ( Figure 1A, top). The hand of the monkey rested on the cushioned armrest equipped with a photo sensor to ensure that the forearm was largely immobile during the grasp. At the beginning of each trial, an object was attached to the robotic arm using a weak magnet and presented to the animal. The monkey's task was to grasp the object and exert enough grip force so that, when the robot retracted, the object would be disengaged from its magnetic coupling with the robot and remain in the monkey's hand ( Figure 1B). Animals were trained to keep their elbow on the armrest to minimize movements of the proximal limb.

Kinematics
We recorded hand and elbow kinematics using a camera-based motion tracking system (Vantage, VICON, Los Angeles, CA). To this end, we placed 30 reflective markers on the joints of the hand, wrist, and proximal limb ( Figure 1C, Figure 2 top). Ten cameras were used to capture the kinematics of the first monkey at a rate of 250 Hz, and fourteen cameras were used to capture the kinematics of the other monkeys at a rate of 100 Hz. We then labeled each marker using Nexus software (VICON, Los Angeles, CA) and performed inverse kinematic modeling (OpenSim, Scott L. Delp et al., 2007) from the resulting time-varying marker positions to infer time-varying joint angles of the limb. Joint angles were smoothed with a 50-ms moving average and the angular velocities were computed from these. For each trial, we identified the start of movement, maximum aperture of fingers, and contact with an object. We labeled these events manually for a subset of trials for each animal and then used linear discriminant analysis to infer maximally likely event times for other trials.

Electrophysiology
We recorded neural signals from monkey 1 using UEAs placed in the post-and pre-central gyri and from FMAs placed in the posterior and anterior banks of the central sulcus ( Figure 1D). For monkeys 2 and 3, we recorded neural signals using arrays of depth-adjustable electrodes (SC96) positioned over the central sulcus ( Figure 1E, Supplementary Figure 1). We used offline spike sorting (Offline Sorter, Plexon, Dallas, TX) to remove non-spike threshold crossings and to isolate individual units when more than one was present on a given trace.

Neural data preprocessing
Time-varying firing rates of all recorded neurons were computed by summing all events in 10-ms bins. We then soft-normalized the firing rates (divided by their range plus a small increment) and convolved the resulting rates with a Gaussian kernel with 100-ms width (Figure 2, bottom). We then reduced the dimensionality of the neural space with Principal Component Analysis (PCA) and preserved components that cumulatively explained at least 90% of the variance in the neural data.

Session stitching
To achieve a sufficient sample size from each cortical field, we pooled data from all sessions for each monkey (see Table 1). To pool kinematics across sessions, we aligned the kinematics from the same condition (object) to maximum hand aperture and averaged them across sessions for each monkey separately (Supplementary Figure 2). To eliminate trials on which the animal used a different grasping strategy for a given object, we discarded trials on which the worst joint correlation between that trial's kinematics and the mean kinematics was below 0.7. We then used the average kinematic traces 600 ms before and 200 ms after the alignment point for decoding (all before object contact).

Decoding
We fit the standard Kalman filter to each kinematic degree of freedom: Where is a joint angle (or joint angular velocity) at time , is a state transition matrix, is a Gaussian process with zero mean and covariance matrix , is observed spiking activity projected on a lowdimensional space with PCA and with accumulated lags over interval prior to movement. is an observation model matrix, and is a Gaussian process with zero mean and covariance matrix .
We performed 10-fold cross-validation, in which we trained the parameters of the filter ( , , , ) on a randomly selected 90% of the trials and tested the model using the remaining 10% of trials. Performance was assessed using the coefficient of determination (R 2 ). To find the optimal lab, we tested the model at various lags and selected the one that yielded the best cross-validated performance (150ms, Supplementary Figure 3).

Decoding kinematics averaged across trials
First, we assessed the degree to which the responses of neuronal populations in M1 and SC convey information about time-varying hand kinematics, pooled across multiple recording sessions (Figure 3, Supplementary Figure 4). We found that, from a population of randomly selected neurons (N=20), we could reconstruct time-varying joint angles accurately for most joints, obtaining a median performance (R 2 ) of 0.62, 0.66, and 0.63 for M1, SC, and the two combined, respectively ( Figure 3B). For 2 out of 3 monkeys M1 decoders outperformed, on average, those based on SC ( Figure 3C). However, for monkey 2, the reverse was true. This difference is likely due to the respective locations of SC recordings: In monkeys 1 and 3, the majority of SC neurons were located in area 2, whereas in monkey 2, they were in area 3a (see Supplementary Table 1). Decoding was only marginally improved with more recently developed decoders, including non-linear ones (Supplementary Figure 5).

Comparing cortical areas
Next, we performed a more detailed analysis of how information about hand posture is distributed within M1 and SC (Figure 4). Indeed, the caudal region of M1 contains more corticomotor (CM) neurons -which make monosynaptic connections with motoneurons -than does its rostral counterpart, and these CM neurons are thought to be critical for highly skilled movements, particularly of the hand (Rathelot & Strick, 2009). Furthermore, while neurons in Brodmann's area 3a exhibit almost exclusively proprioceptive responses, those in area 2 exhibit mixed proprioceptive and cutaneous responses (S. S. Kim, Gomez-Ramirez, Thakur, & Hsiao, 2015), and the latter may interfere with kinematic decoding. Decoding accuracy depends on the size of the neuronal sample, so we measured performance as a function of sample size for all populations ( Figure 4A). We did not find any systematic differences between caudal and rostral M1. However, decoding based on area 2 was systematically worse than that based on other cortical fields. In the one animal with a sufficient sample size, decoding from area 3a achieved better performance than from M1. With perhaps the exception of area 2, then, the various cortical fields seem to contain similar amounts of information about hand kinematics.

Decoding joint groups / principal components
Next, we investigated whether the M1 and SC subpopulations we sampled preferentially encoded movements of different portions of the limb. To this end, we divided the joints into 7 groups -proximal arm joints (elbow), wrist joints (pronation-supination, flexion, extension), and the five digits (separately) -and assessed the average decoding performance for each group. We did not find any systematic patterns, suggesting that the arrays impinged upon neural populations that spanned the hand and arm representation in each area (Supplementary Figure 6).
Joint kinematics of the hand have been shown to exhibit systematic correlational structure, with some joints tending to move together. A common way to reveal this structure is through principal components analysis, which yields a set of mutually orthogonal bases of kinematics ( Figure 5A). We examined the degree to which we could decode each of the principal components from the neuronal responses. When measured using the coefficient of determination, we found that decoder performance dropped sharply for the lower-variance components, with only the first three components showing performance comparable to that for single joints ( Figure 5B-C). However, the drop in performance for the low-variance components vanishes when model fit is expressed using a metric, rRMSE, that is less sensitive to signal variance.

Decoding postures vs. movements
In all of the above analyses, we showed that time-varying postures could be directly decoded from neuronal activity. This approach stands in contrast to that adopted for proximal limb kinematics or the application of proximal limb-related M1 activity to cursor control, which typically involves decoding joint or endpoint velocities from neuronal responses and then integrating these to obtain postures (Chase, Schwartz, & Kass, 2009;S.-P. Kim et al., 2008;Koyama et al., 2010;Taylor, Tillery, & Schwartz, 2002;Velliste et al., 2008). Indeed, even SC decoders for reaching movements appear to perform better when decoding limb velocity than limb posture (Weber et al., 2011). With this in mind, we wished to directly test whether neuronal responses in M1 and SC preferentially encode postures or movements during grasp. To this end, we reconstructed joint angular velocities from sensorimotor responses and compared these to our reconstructions of angular positions. For this analysis, we used a single lag -at 150 ms determined to yield peak performance (Supplementary Figure 3) -because multi-lag models allow for integration (from a velocity to a position signal) or differentiation (from a position to a velocity signal) so make the distinction between postural and movement coding more difficult to discern. Using 20 units from each cortical subpopulation, we found that postures could be significantly better reconstructed than could movements (p<0.01, paired t-tests for each area and monkey), in contrast to what has been observed for the proximal limb ( Figure 6).

Single trial decoding
Finally, we wished to assess the degree to which we could decode single-trial kinematics from sensorimotor cortex and verify that results described above were not artifacts of pooling data across sessions. To this end, we analyzed data from one animal, from whom a sufficient sample size had been achieved on a pair of single sessions (N1 = 44, N2 = 36 from M1 and N1 = 15, N2 = 9 from S1). Specifically, we decoded joint angles and velocities obtained on single trials from the concurrently recorded neuronal responses and compared the resulting performance to that of the average kinematic decoders for the same joints and velocities. We found that decoding performance of pooled and single trial kinematics was equivalent (p>0.05, Figure 7A,B), which may at first be surprising. Indeed, one might expect pooling to degrade performance to the extent that the kinematics are not identical. Our results suggest that kinematics are similar within monkeys across sessions (Supplementary Figure 2). Note that deterioration of decoding performance for pooled-trial decoding due to differences in kinematics across trials is compensated for by the noisier neuronal signals used for single-trial decoding. We also verified that postural decoders significantly outperformed movement decoders even for single trial responses ( Figure  7C).

Decoding from M1
Our approach is similar to that described in Menz et al. (2015), in which 27 degrees of freedom of grasping kinematics were reconstructed from the responses of neurons in posterior parietal cortex and M1. Decoders build from the one area both studies have in common (M1) yielded comparable performance (Supplementary Figure 4).
Anatomically, M1 can be divided into two regions: rostral and caudal M1. Neurons in the caudal part of M1 have direct connections with motoneurons in the spinal cord and might be particularly relevant for dexterous hand control whereas neurons in the rostral region contact mainly spinal interneurons and only indirectly drive muscles (Rathelot & Strick, 2009). Despite the documented anatomical differences, we did not see any differences in decoding performance between the two areas in monkeys for whom we have both populations on M1 neurons available. As grasp comprises highly correlated patterns of joint movements, this manual behavior may not require a direct line to muscles. Caudal and rostral M1 may be differentially engaged in non-prehensile dexterous movements that involve more individuated finger movements (Bortoff & Strick, 1993;Maier et al., 1997).

Decoding from SC
Previous attempts to decode kinematics from SC activity focused on proximal limb movements. Adding signals from SC significantly improved performance in macaques engaged in reaching beyond the performance achieved with M1 signals alone (Carmena et al., 2003;Lebedev et al., 2005). Decoding limb kinematics from electrocorticographic (ECoG) signals in SC achieves similar performance as with ECoG signals in M1 (Branco et al., 2017;Farrokhi & Erfanian, 2018).
In the present study, we decode, for the first time, hand kinematics from the spiking activity of neurons in areas 3a and 2 and find that both areas yield well above chance performance in all joints. Area 3a showed performance comparable to M1, consistent with earlier observations of single-unit responses (Goodman et al., 2019). Area 2, which lies downstream of area 3a, yielded significantly worse performance than M1, consistent with the observation that this cortical field also receives substantial cutaneous input, which may obscure the proprioceptive representations, even before contact. While movement has been previously shown to activate cutaneous neurons in the absence of contact (S. S. Kim et al., 2015;Rincon-Gonzalez, Warren, Meller, & Helms Tillery, 2011), our results suggest that these cutaneous responses do not support kinematic encoding of the hand.

Posture and movement decoding
Neurons in motor cortex and proprioceptive areas of somatosensory cortex have been shown to preferentially encode velocity of the proximal limb (movement), rather than its position (posture) (Paninski, Fellows, Hatsopoulos, & Donoghue, 2004;Wang, Chan, Heldman, & Moran, 2007). Consistent with this finding, decoders of joint velocities generally outperform those of joint positions (S.-P. Kim et al., 2008). However, this preferential encoding and decoding of movement over posture has been tested exclusively for proximal limb joints. When we directly compared posture vs. movement decoding of the hand, we found that posture can be more faithfully decoded than can movement ( Figure 6) and this postural preference is not an artefact of averaging ( Figure 6C). For this analysis, we restricted the decoder to a single lag to avoid the effect of linear integration or differentiation that would confound the result. Indeed, in multiple-lag model, the difference in performance between posture and movement decoders becomes less pronounced (Supplementary Figure 7). Postural preference implies a difference between proximal and distal limb representations, which may be inherited from the different inertial and biomechanical properties of the arm and the hand and is well suited to support stereognosis (Goodman et al., 2019).

Decoding methods
The application of machine learning to kinematic/cursor decoding is now standard practice (Glaser et al., 2017). However, the extent to which recently developed decoding approaches robustly improve performance of high dimensional decoders (of hand kinematics, e.g.) has not been investigated. Here, we applied a variety of well-established linear and non-linear approaches to decoding hand movements (described in detail in Glaser et al., 2017) and found that non-linear methods generally outperform standard linear ones, as might be expected (Supplementary Figure 5). However, the improvement is typically minimal and may not justify the added computational complexity and potential of overfitting.

Closed-loop robotic limb control
A major application of kinematic decoders is to drive brain machine interfaces aimed at restoring movement in patients with sensorimotor impairments (Bensmaia & Miller, 2014). Indeed, intended movements can be inferred from neural signals in sensorimotor cortex and converted into control commands of an external device, such as a robotic limb. While remarkable control has been previously achieved, with up to 10 degrees of freedom (Wodlinger et al., 2015), the control of the hand remains relatively primitive, restricted to 4 degrees of freedom. Here, we show that up to 30 DoF of the upper limb can be simultaneously reconstructed using a fast and simple approach with a small number of neurons (~20).
The dexterity of robotic hands is severely limited by the absence of sensory feedback about hand posture. One approach to convey proprioceptive feedback would be to stimulate proprioceptive neurons in SC (Armenta Salas et al., 2018;Brian M. London et al., 2008;Tomlinson & Miller, 2016). Our results suggest that SC neurons -particularly in area 3a -carry a faithful representation of hand posture. However, the topographical organization of this scheme has not yet been established. Indeed, the success of tactile feedback through ICMS has hinged on the somatotopic organization of cutaneous representations in SC. Whether the proprioceptive representation exhibits a spatial topography that can be exploited to convey artificial proprioceptive feedback remains to be elucidated.