Trajectory Learning by Therapists' Demonstrations for an Upper Limb Rehabilitation Exoskeleton

In this work, we propose a method for trajectory implementation based on the Learning by Demonstrations approach to deal with trajectory planning issues in upper-limb rehabilitation exoskeletons. Currently applied path-planning methods use mathematical trajectories or Teach-and-play approaches. The former do not propose human-like movements to patients, which is crucial to induce correct motor relearning. Moreover, they often differ from therapists' expectations of how movements should be executed, reducing the acceptability and use of exoskeletons in hospitals. The latter, using a single filtered trajectory demonstration, better meet therapists' expectations but lack consistency and optimization. In our approach, we employed Hidden Markov Models, still never used for rehabilitation robotics, to study a set of demonstrations and we optimized the results to respect physiological muscular activation patterns. Recorded few repetitions of a movement from the interaction of a therapist with an exoskeleton, our machine-learning-based algorithm returns a ready-to-use trajectory representing the therapist's desires. We tested our method on a 4 degrees-of-freedom exoskeleton to record 5 exercises, interacting with 5 therapists. Comparing our trajectories with those obtained with literature methods, we see that our approach produces better kinematic and human-likeness results, and is better according to the global opinion expressed by the therapists.


I. INTRODUCTION
N OWADAYS the social burden represented by the effects of neuromuscular and neurodegenerative diseases is continuously growing [1]. In particular, motor impairments affecting the mobility of the upper limb, which is fundamental in the execution of multiple activities of daily living, can strongly influence patients' abilities, independence and social life [2]. In this context, exoskeletons for rehabilitation and assistance of the upper limb offer new advantages to the execution of classical therapy sessions. Their use guarantees the capacity of performing high amounts of practice, carrying out task-oriented, interactive and customizable exercises [3].
However, to assure the efficacy of the exercises proposed to the patients, robots must perform trajectories that accurately reproduce physiological human movements [4], [5]. The correctness of such movements enhances the relearning process towards the reconstruction of the correct neural paths and so the recovery of the lost motor functions [6]. This point leads to the necessity of managing the trajectory-planning issue, which is therefore important not just to ensure safety and comfort for the user and to operate in realistic environments, but also to guarantee the correct execution of rehabilitation movements. In addition, the appreciation of the proposed exercises by therapists is a further key issue in terms of the acceptability of the device.
During the course of the years, various path-planning methods have been proposed for rehabilitation exercises. Methods employing mathematical trajectories are theoretically simpler to be implemented, but they include problems of redundancy, poor inter-joint coordination and low human likeness [7]. This makes the resulting trajectories less appreciated by physiotherapists and reduces the overall level of acceptability of exoskeletons, limiting their presence in daily clinical practice. Learning the movements to be executed directly from a human demonstrator [8] can be a solution to deal with the high complexity of physiological arm movements and obtain human-like trajectories. Anyway, these methods for trajectory planning from a teacher's demonstration are still more applied for industrial and collaborative robotics than for rehabilitation. Recorded trajectories can be treated in different ways. In this letter, we will distinguish between the approaches that record a single demonstration and simply present it, once filtered, to the users (and we will be defining them as the Teach and Play methods) and those that employ Machine Learning algorithms to elaborate multiple demonstrations of the task we want the robot to learn and extract their underlying human features (that will be This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ called Learning by Demonstration methods). Teach and Play methods are the most frequently applied but have reliability issues, repeating also all the possible recording errors. Results are highly dependent on the quality of the recording and require accuracy and concentration efforts from the human teacher's side. Learning by Demonstration methods, on the other side, guarantee consistency in the results but are still little used in robotics for rehabilitation. Scientific literature results about machine-based methods for trajectory learning are much more numerous for industrial and collaborative robots.
This work proposes a Learning by Demonstration pathplanning method for rehabilitation exercises based on the Hidden Markov Models (HMM) machine learning algorithm, which is typically applied to collaborative robots. From a small set of demonstrations (5 repetitions) of a given exercise, made by a therapist, our algorithm extracts the "best trajectory" that should represent the real intentions of the therapist him/herself regarding the execution of that specific task. Based on our tests, realized on a 4 degrees-of-freedom (DOFs) exoskeleton with 5 therapists for different exercises, this method solves both the acceptability issues related to mathematical trajectories and the problem of consistency and kinematical accuracy of Teach and Play methods. Also, if compared to other similar approaches, it has a low computational cost and its timing is compatible with the duration of a therapy session.
According to what stated so far and to our knowledge of stateof-the-art methods, our work brings the following contributions: r It applies HMM, a well-known and powerful approach, to the field of upper-limb rehabilitation exoskeletons, for which it was not yet employed. HMM has been employed in robotics for trajectory implementation on industrial collaborative robots [9] and exoskeletons [10], but not for rehabilitation purposes; r It implements a joint-synchronization algorithm to respect the physiological patterns of muscular activation of the various movement recorded. It verifies that the natural order of activation of upper-limb joints is consistent with registrations and corrects undesired modifications; r It tests the algorithm on a real rehabilitation exoskeleton: it involves a relevant application and tests the robustness of the approach with respect to experimental errors;

A. Mathematical Planning Methods
Mathematical motion planning is the classical method that has been adopted over the years. Exercises are typically expressed in Cartesian space, but given the issues related to inverse kinematics and the kinematic redundancy of the majority of existing exoskeletons, it is easier to plan in joint space [7]. Trajectories are often polynomials of the 5th or 7th order, assuming the minimum jerk theory [11], [12], [13]. In any case, polynomial trajectories are often a simplification and can not reproduce the complexity of some movements. They can be a good solution for simple tasks but they can become inadequate for more complex rehabilitation exercises, that require the execution of specific movements to regain precise abilities [7], [14].

B. Learning From a Teacher
Nowadays robots can perform thousands of human tasks with high accuracy. The main difference, however, with an action executed by a human being regards the level of complexity and variability that human motion planning can handle. Humans have higher perception and manipulation capabilities, and a higher degree of adaptation to new environments [4]. Recording a movement directly from a human subject is the most direct way to transfer this information to the robot. Teaching methods can be implemented through various Teloperation approaches [15]. In this letter, we focus on the specific teaching approach called Kinestethic teaching: the robot is set in a modality we can call "transparent" and a human user manipulates it in the execution of a specific movement. The robot compensates for its weight and if it is an exoskeleton, for the weight of any eventual wearer's arm without actively moving. Instead, it adapts to externally produced movements without opposing, and it records the trajectory [15]. Movements to be recorded can be produced by an unimpaired subject [16] wearing the robot, by the healthy limb (in the case of hemiparetic patients) [17] or by a therapist that manipulates the exoskeleton as it was a human arm [18]. This last solution is, probably, the most compatible with a rehabilitation session, as it is consistent with what normally happens (the therapist manipulates the arm of the patient to move it, correct his/her action and teach him/her what to do) and it can be employed anytime it is needed during therapy.
1) Teach and Play Approach (TeP): Basic trajectory learning systems require the registration of one single repetition of the task we want the robot to repeat. Recorded trajectories are then filtered or smoothed [19], [20], or analyzed to identify some via points [18]. Anyway, recording a single repetition of the exercise could end up in results that strictly depend on how well-performed that repetition is. Errors, undesired pauses, changes in velocities profiles, and jerky movements are recorded and remain unnoticed. The final result has to be verified on the exoskeleton before being presented to the patients. If the trajectory is not satisfying, the whole procedure must be repeated from the beginning: this is time-consuming and surely not compatible with the timing of a rehabilitation session.
2) Learning by Demonstrations Approach (LbD): Given the limitations in the Teach and Play approach, we can consider other trajectory-learning methods that work on multiple repetitions of the same exercise and apply Machine Learning algorithms to add consistency to the results. The purpose of such methods is to understand the rationale that is behind the execution of a certain movement. In rehabilitation, we suppose that expert therapists know which specific movements are needed or have to be enhanced to help the patient regain the motor abilities he/she has lost. Applying LbD methods to their demonstrations should ensure that the robot learns and reproduces the rehabilitative features of the movements they perform. Known approaches for path-planning from a set of demonstrations include Dynamic Movement Primitives (DMPs) approach, Hidden Markov models (HMM), Gaussian Mixture Models, and Neural Networks [15]. DMPs, in particular, have been recently employed for rehabilitation exoskeletons for specific reaching and grasping tasks [8]. Despite the effectiveness and elegance of the method, DMPs are a deterministic approach whose results are highly dependent on the choice of hyperparameters [21]. HMM approach can offer an easily implementable and fast applicable alternative to DMPs [4], [9]. HMM is a statistical approach that studies the probabilities for a sequence of observable events, or states. It can be used to encode the motion of a robot and find the state path that has the highest probability from a set of trajectory demonstrations [10]. It has been applied to industrial robots [9] and humanoid robots [22]. Anyway, to the best of our knowledge, the only work that applies HMM for trajectory learning to an exoskeleton is not indicated for rehabilitation purposes nor tested with patients or therapists [10]. The exploration of HMM for this specific purpose has the potential to solve the limitations other methods have and be the solution to provide a ready-to-use system for trajectory learning during rehabilitation. Moreover, HMM works well for small datasets, that normally represent an issue for other Learning by demonstration methods [23].

III. METHODS: LBD WITH HMM
As anticipated, in light of the advantages underlined in the previous paragraphs, we based our LbD algorithm for trajectory learning on a Hidden Markov Model approach developed using MATLAB R2022a. We implemented the algorithm starting from the approach described by Roveda et al. in [9] for industrial robots. Our approach is modular so, even if it was taught to be tested on AGREE, a prototype of 4 degrees-of-freedom upper limb exoskeleton by Politecnico di Milano [13], the code can anyway be adapted to other exoskeletons with a different number of DOFs. AGREE has a loadcell-based impedance control system that implements multiple human-robot interaction modalities and assures that the robot is always compliant with users' contributions while moving along predefined (ideal) trajectories [13]. Its first three joints actuate the shoulder movements, the fourth one is the joint for flexo-extension of the elbow.
Our methodology is schematically represented in Fig. 1 and follows the steps described below.

A. LbD Algorithm
Input data: The input data are composed of five demonstrations of a specific movement made by a human teacher, moving between a start and an endpoint. Each demonstration is a set of 4 trajectories in joint space, one for each joint involved (the number of trajectories depends on the DOFs of the robot). Sequences of joint states are different for each demonstration and represent the actual input to the model. Trajectories are vectors of angular positions, recorded in time. The i-th joint trajectory is indicated as: . n indicates the n-th out of 5 demonstrations of the trajectory and T n time length of such demonstration.
Time-scaling: For each joint, the five repetitions are timescaled. Since they all come from different demonstrations, they have slightly different duration. We re-scale them to the mean Core algorithm: the core of the algorithm divides the trajectories into 32 subintervals [9]. To employ the HMM, we need to define a set of hidden states and observation symbols.
1) Creation of a Codebook: We start by applying the Linde-Buzo-Gray algorithm, a variant of k-means clustering, to quantize the trajectories of each joint and generate a codebook [24]. The number of clusters gives the number of observation symbols.
2) HMM: We proceed by implementing an HMM Bakis Left-Right algorithm [25], which is used to analyze the five coded demonstrations and select the most probable one. We study the frequency spectrum of the trajectory via the Short Time Fourier Transform and identify the number (M ) of states The key points are defined by the number of columns of the STFT matrix, obtained using MATLAB Signal Processing Toolbox TM . Our set of observations Q * i,n is used to train the model, which evaluates the probability of each trajectory sub-interval x to be in a certain HMM state s a while the following sub-interval x + 1 is in another state s b . The most consistently demonstrated trajectory is the one showing the maximum likelihood. For each joint, the preferred trajectory is J i = Q * i,pref i , with 1 ≤ i ≤ 4 and pref i is the index of the preferred demonstration for the i-th joint. An example of the results of this step is shown in Fig. 2. As we can see, preferred trajectories for each joint do not necessarily belong to the same demonstration. Before applying our method, we experimentally if Δt demo * Δt pref < 0 then Shift J * 4 in time end if verified that the composition of different trajectories produces a kinematic error which guarantees reaching the targets of the various exercises (that are circles with a 6 cm diameter we normally used for rehabilitation exercises).
Joints synchronization: Differently from what happens in industrial robotics, in the rehabilitation field the activation order of the various joints must be verified to respect the physiological muscle activation patterns. The correct activation sequence normally involves proximal-to-distal muscles, meaning that shoulder muscles start contracting before distal ones [26]. This sequence can be anyway influenced by the configuration of the arm and its interaction with the external environment. Once we obtain each joint's "preferred" trajectory, we compose them and verify their synchronization in time. Despite the time-scaling, we want to be sure that the activation order of the joints is not altered by the composition of the demonstrations. This part of our system works as indicated in Algorithm 1. We evaluate the activation order of the five demonstrations by comparing the time instant in which the first two joints (shoulder abdo-adduction and flexo-extension) and the fourth one (elbow flexo-extension) start moving. A movement is considered to start when the velocity overcomes a certain threshold th = 5% V el max [27]. We calculate, in terms of time samples, the average time difference, over the five repetitions, between elbow and shoulder activation. We then evaluate also the activation order in the four preferred trajectories. If the sign of the time difference changes, we timeshift the trajectory of the fourth joint, to obtain an activation-time difference equivalent to the previously evaluated mean. Our final trajectories maintain the shoulder-elbow inter-joint coordination of the human demonstrations, as we verified by comparing their Temporal Coordination index [28].
Human-likeness: According to the principle that states that a minimum-jerk movement is human-like [29], [30], we concluded our elaboration with the application of a jerk-minimization algorithm (using the Data Processing and Visualization toolbox from MATLAB).

B. TeP Algorithm for Comparison
In order to be able to make a comparison, as seen in the literature, we wrote also a simple algorithm for filtering the trajectory, that applies a fourth-order Butterworth filter with a 2 Hz cutoff frequency [31].

C. Experimental Testing of the Proposed Methodology
We have carried out experimental testing to verify the proposed method's efficacy and perceived utility. We involved 5 therapists (both physiotherapists and occupational therapists) from the clinical facility Casa di Cura Privata del Policlinico S.p.A in Milan (Italy). Each therapist has been tested individually to avoid them mutually influencing each other. A female healthy user wore the exoskeleton AGREE while sitting on a chair, during the entire duration of the tests (as represented in Fig. 3). She was asked to behave passively and avoid any voluntary movement. The experimental setup also included a table, to be located in front of the user, and a set of objects that could be used to interact with (such as bottles and boxes). All the subjects involved in the test signed informed consent and the testing was approved by the Ethical Committee of Politecnico di Milano.
The experiment was divided into two successive phases: (i) Trajectories recording and (ii) Trajectories re-proposition and evaluation. During phase (i) we asked each therapist to manipulate the robotic arm (worn by the healthy user) as if it was a human arm, to perform specific exercises. They moved it by holding the blue cover of the upper arm with one hand and the cover of the forearm with the other, as indicated in Fig. 3. They were asked to produce movements exactly as they would have realized them during a classic therapy session, and to repeat them five times per exercise. During this step, the exoskeleton was set in "transparent" mode [32]. While the therapists moved the robot, we recorded the trajectories at a frequency of 1 kHz. In this way, we obtained the trajectories as vectors of angular positions, registered by the encoders of the exoskeleton, one for each joint (incremental encoders with a resolution of 2048 counts per revolution). We recorded movements for five different r Hand from a rest position located on the ipsilateral leg of the user to the contralateral shoulder; r Circle drawing with the hand, starting from the rest position on the table and moving always at about the same height. The first three exercises were already part of the exercises proposed by AGREE, and the trajectories were realized using fifth-order polynomials [13]. For each exercise, therapists repeated the movements they wanted the robot to learn five times. Two therapists decided to record two variants of the third exercise, with different locations of the end-point. At the end of the recording step, the trajectories were fed to the two algorithms we implemented (the T eP one and the LbD one). In this way, we obtained two different output trajectories for each of the five rehabilitation exercises.
During phase (ii), we made the exoskeleton perform the five exercises, one after the other. For each exercise, the robot executed the two possible trajectories we got from the two algorithms, in random order. We had the healthy user wearing the robot passively on her arm, and the therapists were asked to observe the execution of the exercises, without revealing to them which algorithm was used to create the movements they were observing. We asked the therapist to compile a questionnaire expressing their opinion on the various trajectories the exoskeleton was performing for each task. For each exercise, we asked them to indicate their preferred trajectory. We also asked them to quantify their level of agreement with the sentences reported in Table I, with scores on a Likert scale from 1 (Total disagreement) to 5 (Total agreement).We collected some personal information regarding the age, the years of working experience in the rehabilitation field and the predisposition to technology and reported them in Table II.  TABLE II  TABLES REPORTING SOME PERSONAL INFORMATION ABOUT THE THERAPISTS.  THE FIELD ATTITUDE TOWARDS TECHNOLOGY REPORTS THE MEAN VALUE OF  THE SCORES (ON A SCALE 1-5) THEY ATTRIBUTED TO 4 SENTENCES  INDICATING THEIR LEVEL OF INTEREST AND CONFIDENCE WHEN DEALING  WITH NEW TECHNOLOGIES AND WITH TECHNOLOGY IN THEIR  EVERYDAY LIFE

D. Analysis of the Trajectories: Kinematics and Human-Likeness
Apart from the evaluation provided by the therapists about the movements obtained with the various methods, we decided to compare the trajectories obtained with the TeP and LbD algorithms in terms of kinematics and human likeness. We wanted to understand if our algorithm, with its trajectory composition from multiple demonstrations, guarantees the user to reach the target points. We also wanted to verify that the movements the exoskeleton proposes to the users are effectively physiological and do favour the re-learning of correct motor paths. We evaluated, firstly for each joint and then in task space: r The trajectory end-point reaching error; r The normalized jerk score (NJS), expressed as: r The mean jerk over the execution of a trajectory: θ(t) is the angular position occupied by the joint during the movement; r The Spectral Arc Length measure (SAL), which estimates the smoothness through the arc length of the Fourier magnitude spectrum of the trajectory speed profile [33].
V (ω) represents the Fourier magnitude spectrum of velocity andV is the spectrum normalized over the spectrum value at 0 Hz. SAL is obtained by integrating into a frequency range between 0 Hz and ω c , which can be set at 20 Hz, as Balasubramanian et al. did, to include all the possible frequencies of human movements. The evaluation of smoothness reflects also the necessity of verifying that human-like movements produce a low energy consumption. We can, in fact, say that smooth movements are energetically economical [34].

A. Kinematics and Human-Likeness
This section reports the results of the analysis we performed on the final trajectories. As regards the kinematic analysis and the study of the reaching error, the two methods are equivalent. The error produced is in line with the standard negligible kinematic error produced by the operating parameters of AGREE (< 2 − 3 cm) and guarantees always reaching the targets [13]. Table III reports the statistics of the human-likeness analysis of the trajectories. The mean values indicated for NJS, Mean Jerk and SAL are evaluated, per each therapist, on all the exercises. From the graphs in Fig. 4 we can easily see that the LbD trajectories have globally lower values for both NJS and Mean Jerk and higher SAL values. Comparing the results collected for T eP and LbD trajectories with a Mann-Whitney U test, we can assess that such differences are statistically significant. This means that our method produces trajectories that have lower jerk values and higher smoothness if compared with trajectories obtained through the classical filtering process. These global results are confirmed if we analyze the three parameters exercise per exercise. LbD approach appears to be better in terms of 'Human likeness'. T &P trajectories are generated by humans  too, but their human likeness cannot be given for granted as they are produced while moving an external object. It is interesting to notice that the results for T eP trajectories show greater variability, both considering exercises individually and collectively. Global quantitative differences in variability for every parameter are reported in Table III, in terms of interquartile ranges. The quality of T eP trajectories is variable because it strictly depends on how well performed the first trajectory is. The filtering process, while quick and effective, does not guarantee the elimination of any errors, jerks or inaccuracies recorded by the therapist. The T eP approach gives optimal results if the only trajectory recorded is accurate. In this sense, the LbD method gives more consistent results, avoids occasional mistakes and guarantees repeatability.

B. Therapists' Evaluation
Collecting the results of therapists' blind evaluations (shown in Fig. 5), we can see that the trajectories obtained with the LbD approach were preferred 19 out of 27 times (70.4% of the total preference expressions). From an analysis of the answers that therapists gave to the first three questions of Table I, repeated for every trajectory obtained, we can see that the mean scores given to the two methods are similar. Even if differences are not significant (as we verified through the statistical analysis with a Mann-Whitney U test), LbD trajectories are always evaluated as slightly better or at least equivalent to the TeP ones (see Table IV). These results guarantee the acceptability of our method from the therapist's perspective.
We analyzed the results investigating possible correlations between the scores the therapists gave to the exercises and the personal information we collected from them (from Table II). Anyway, factors such as age, duration of the total work experience in rehabilitation and attitude towards technology do not seem to influence their evaluation.
We also asked the therapists to tell us if they would accept repeating the same movement a the beginning of a therapy session to let the robot learn new trajectories and if our method can be considered for application in everyday clinical practice (see Questions 4 and 5 in Table I). They all agreed with the sentences that state that the repetition of the same movement five times can be compatible with the timing of therapy sessions (global score 4.7/5).

V. DISCUSSIONS AND CONCLUSIONS
The overall analysis shows that our LbD approach produces human-like trajectories, with higher consistency and better results than those obtained by single-trajectory filtering TeP methods (such as those presented in [19] and [20]), based on the study of smoothness and jerk. Learning directly from a human solves the limitations of mathematical trajectories pointed out in [14]. We are able to reproduce any kind of movement, even complex ones, required by therapists. Five repetitions make the results consistent, as we can see from the different variability levels that are noticeable in Fig. 2. Our system is well accepted by therapists, who think the movements produced by our method are quite representative of their intention and suitable for the execution of rehabilitation exercises. The differences in the evaluation are of course more evident if the single demonstration that is elaborated with the TeP approach belongs to the cases in which the movement is executed with low precision. We have to consider that the probability that this happens is high if the therapists need to respect the strict timing available for their sessions. The time required for the registration of the trajectories and for the algorithm to elaborate the data (≈ 10 sec) is less than two minutes. This makes the whole method easily applicable anytime during a session, overcoming the limitations connected to the application of other machine-learning methods, as suggested in [4]. We believe that one of the strengths of our method is the high ratio between the compatibility with the requirements of a therapy session and the effort needed for its implementation. Recording five repetitions do not increase the effort that is required of the therapist. Instead, repeating the same movement guarantees an improvement in the quality of the recordings. On the contrary, having to record a single repetition demands a greater effort of concentration to be sure that that specific trajectory is well executed.
With this work, we have proposed a ready-to-use system for trajectory planning that guarantees to obtain effective and physiological rehabilitation movements, without the need for preventive, time-consuming controls on the results. We have applied the concept of learning from a human teacher, which solves human-likeness issues caused by analogous methods employing geometrical trajectories, and we have introduced HMM, which is mostly used in industrial robotics, for consistency and reliability of the results. The trajectories we obtain are custom and fit therapists' requirements. We have seen that our method is easy and fast to apply and it would be interesting to expand the study. We could compare the results with those obtainable with the application of other machine-learning-based methods, such as DMPs, to quantify computation and application times. The study could also be expanded also by studying in further detail the inter-joint activation sequences of upper-limb movements. We tested our method with five therapists. Results seem promising but the reliability of our conclusion would benefit from the involvement of more therapists. It could be interesting to apply our method to other exoskeletons and compare the results.