A Hierarchical Bayesian Model for Cyber-Human Assessment of Movement in Upper Extremity Stroke Rehabilitation

The evidence-based quantification of the relation between changes in movement quality and functionality can assist clinicians in achieving more effective structuring or adapting of therapy. In this paper, clinicians rated task, segment, and composite movement feature performance for 478 videos of stroke survivors executing upper extremity therapy tasks. We used the clinician ratings to develop a Hierarchical Bayesian Model (HBM) with task, segment, and composite layers for computing the statistical relation of movement quality changes to function. The model was enhanced through a detailed correlation graph ( $\Delta _{\textit {HBM}}$ ) that links computationally extracted kinematics with clinician-rated composite features for different task-segment combinations. Utilizing the weights and correlation graphs, we finally derive reverse cascading probabilities of the proposed HBM from kinematics to composite features, segments, and tasks. In a test involving 98 cases where clinician ratings differed, the HBM resolved 95% of these discrepancies. The model effectively aligned kinematic data with specific task-segment combinations in over 90% of cases. Once the HBM is expanded and refined through additional data it can be used for the automated calculation of statistical relations between changes in kinematics and performance of functional tasks and the generation of therapy assessment recommendations for clinicians. While our work primarily focuses on the upper extremities of stroke survivors, the HBM can be adapted to many other neurorehabilitation contexts.


I. INTRODUCTION
E STABLISHING the detailed relationship between changes in function and the underlying changes in movement quality remains a significant challenge in neurorehabilitation [1], [2].During rehabilitation exercises and assessments, clinicians are limited in the number of impaired movement elements to which they can attend.Consequently, their ability to connect their expert observations to standardized norms or values [3] is limited.Depending on their training and experience, clinicians focus on different movement elements when assessing patients during either direct interactions or when observing videos of patient performance [4], [5].This level of subjective assessment further impedes the establishment of a normative quantitative framework for the integrated assessment of movement quality and function [6], [7].
The evidence-based quantification of the relation between changes in movement quality and function can assist clinicians in structuring and adapting therapy more effectively [1].It can lead to standardized assessment and adaptation across different clinics resulting in large-scale coherent data sets for automated assessment.The automated assessment will leave more time for clinicians to focus on treatment and allow for remotely supervised therapy at home [8].Evidence-based quantification can be achieved by integrating the detailed tracking of kinematics into validated, expert-driven, clinical measures [2], [9].Although high-end sensing technologies can provide the detailed tracking of kinematics, these technologies are cumbersome even in the clinic, and certainly not yet feasible for the home.Tracking of movement through marker-based capture or intricate exoskeletons is costly, complex, and obtrusive [10], [11], [12].The home environments of many stroke survivors are also often constrained and cannot easily accommodate the type of large-form technology typically used in hospital or clinic-based rehabilitation systems [13].Even small systems, if intrusive or perceived negatively, tend to be rejected by the stroke survivor and/or their care partner [14].
Therefore, kinematic data needs to be captured through low-cost and unobtrusive means such as video or IMU-based capture that respects privacy and does not interfere with daily functions [15].Low-cost video or IMU capture has limitations [16], [17], [18] that result in low-fidelity data which nonetheless captures the rehabilitation movement as it is used in the real world.The resulting data is also variable as the movement of stroke survivors varies significantly across individuals [18].Therefore, the analysis framework needs to leverage domain expertise to counteract the computational challenges of working with low-fidelity and variable data.
Our team proposes a Hierarchical Bayesian Model (HBM) for integrating clinical expertise with kinematic tracking and analysis to compute the relationship between movement quality to function in rehabilitation using low fidelity and variable data.Hierarchical models are used for the study of many areas of complex human activity such as language [19], music [20], logic and computation [21], sports [22] and motor learning, in general, [23].This has inspired the development of generalizable models of human intelligence [24], [25], and human learning [26].Tenebaum et al. [27] suggests that human learning in complex situations with noisy data can be modeled as a Hierarchical Bayesian model (HBM) [27].Building on this prior work, our proposed HBM captures clinician knowledge through the rating of videos of the movement of stroke survivors; expert clinicians rate in a hierarchical and standardized manner the performance of each task, the segments of the tasks, and the key movement features of the segments.We then expand the task, segment, and composite features hierarchy through kinematics and show how this four-layer model can help establish evidence-based quantifiable relations between movement changes and function, and how these relationships can then be used for the automated assessment of movement.We focus our article on the application of the model to the rehabilitation of the upper extremities of stroke survivors.However, the model can be adapted to many other neurorehabilitation contexts.

II. BACKGROUND
Technological systems aiming to assist the clinician in delivering rehabilitation may not be adopted by clinicians if they are incompatible with the clinicians' approaches or introduce steep learning curves [28].Busy clinicians have limited training time to learn, troubleshoot, and maintain complex new systems.Over the past few years, our team has used participatory design processes [29] and custom-made interactive video rating tools [30] to help expert clinicians reflect on and reveal their movement assessment processes and the related internalized (tacit) rating schemas so that we can base our cyber-human assessment models on the observable expert schemas.Our work reveals that clinicians use a hierarchical probabilistic process for dealing with the uncertainty and complexity of therapy assessment.
Clinicians characterize overall impairment as mild, moderate, and severe.Clinicians also use category combinations, mild/moderate and moderate/severe, thus allowing for a total of five categories.This categorization is based on validated clinical tests that use measurements of the components of the physical apparatus (e.g.Fugl-Meyer test) [31], observations of timed-tasks in the clinic (e.g.Action Research Arm Test (ARAT) [32]), and questionnaires on Activities of Daily Living (ADL) (e.g.Motor Activity Log (MAL) [33]).The highest scores on these tests (i.e. a 57 in the ARAT) are associated with full recovery of the function [34].To help increase the impact of therapy on daily functionality, clinicians utilize sets of generalizable therapy tasks that map well to ADLs.If the performance of these tasks improves during therapy in the clinic, then the performance of ADLs should also improve, indicating that the impairment level is decreasing and functionality is being recovered.There are well-established and validated procedures for the rating of therapy tasks by clinicians (e.g. the ratings of tasks in the ARAT or WMFT) [35].We therefore consider the standardized rating of therapy tasks to be well observable component of how clinicians assess functionality and place the rated tasks at the highest level of our HBM.There are well-established paradigms within validated clinical measures (e.g., WMFT, ARAT, etc.) for rating the performance of tasks from 0 to 3, where 0 denotes that the task was not attempted and 3 indicates close to the unimpaired performance.To allow for this rating approach to be easily used by different clinicians, we have worked with expert clinicians over the past five years [30] to establish language that maps the performance of a task to a standardized rating between 0 and 3 in an explicit and replicable manner (details in the supplementary document).
To manage the complexity of real-time movement impairment observations across different therapy tasks, clinicians tend to segment tasks into a small set of generalizable segments that can be combined through different paths to generate the therapy tasks.Even though most clinicians use intuitive segmentation of movement for observation and assessment, the segment vocabulary is not standardized.We again worked with expert clinicians to create a standardized segment vocabulary that can produce many of the upper extremity therapy tasks that map to ADLs [30].The segments are Initiation + Progression (IP), Termination (T), Manipulate & Transport (MTR), and Place, Release and Return (PR).As an example, the "Close a lid" task can be described by the following codification: subject reaches out and grasps a doorknob-shaped object with their impaired side (IP) and they successfully grasp the object with a sphere grip (T), they then bring the object and place it on top of the cone-shaped object (MTR).They then release the object and return the hand to the rest position (PR).For computational purposes, the devised segment vocabulary and the combination paths that allow the compilation of the tasks can be represented as a simple state machine (see Fig. 2).
We place the segments on the second layer of our HBM.The clinicians proposed that segments should be rated on the same scale as the tasks (0-3) to create rating consistency across the tasks and segment layers.To assess segments in real-time, the clinicians significantly limit the movement features observed per type of segment.This limitation is achieved by using their own experience to develop a probabilistic filtering of irrelevant features for a segment (i.e.digit positioning is likely not that relevant to movement initiation), and a composite observation of relevant individual features (i.e. a strategy for quick impressions of shoulder and torso compensation during movement initiation).This process is not standardized as the filtering and compositing activities are based on individual experience and training.We again collaborated with the clinicians to define a consensus-limited set of composite movement features that are important when assessing the performance of each segment in our model [30]).The resulting rubric identifies three to five key features to assess for each segment.The detailed rating rubric is given in the supplementary document.
We also worked with the clinicians to establish operational definitions of the terms used to evaluate the composite movement features: shoulder elevation, flexion, and abduction (SEAFR); trunk stabilization, flexion, sway, and rotation (TS); range of motion for the elbow (ROME); forearm pronation/supination (FPS); wrist position aligned to task (WPAT); hand aperture (HA); digit positioning and orientation (DP); smoothness and accuracy in limb trajectory (SAT); final placement of the object (FPO); digit movement during release (DMR).The full operational definitions for all terms used to assess composite features are given in the supplementary document.Since each feature is a composite observation of multiple components (i.e.assessment of appropriate shoulder movement combines elevation, flexion, and abduction) we termed these movement features as "composite features".The clinicians agreed that the best action they can complete in real-time is a coarse assessment of composite features where they describe a feature as impaired or not impaired, without quantifying this categorization in detail.That is why all composite features are rated as Impaired/Unimpaired (I/U).We place composite features in the third layer of our hierarchy (Fig. 1) Since clinicians can only provide coarse assessments of composite features, the clinician impressions need to be connected to quantitative analysis of related kinematics to achieve a detailed and replicable evaluation of movement.We therefore place detailed kinematics of movement during therapy at the fourth layer of our HBM (Fig. 1).As discussed earlier, kinematics should be captured through low-cost/low-intrusion technology (i.e.video cameras or unobtrusive wearable sensors).Through prior work, we have established a set of 20 kinematic features that can be used to quantify detailed changes in movement for the upper extremity during stroke rehabilitation [36].The details for extracting the kinematics from the video data are described in the supplementary document.We indicate with a * the kinematics that can only be estimated when using a single video camera.
We can now see how the full hierarchy of the HBM can be used to begin to reveal the integrative relations between function and movement quality (Fig. 1).The patient's functionality is captured through the rating of tasks that relate to ADLs.When certain types of tasks (i.e.reaching and grasping different types and sizes of objects and transporting them successfully to the top of a shelf) consistently receive 3s it indicates that functionality relating to the ADLs associated with these types of tasks has been recovered.When all types of tasks receive 3s consistently then full functionality has been recovered.The performance of the therapy tasks depends on the performance of different series of standardized segments.The clinicians choose to use the same rating approach across tasks and segments thus indicating that the performance of the segments is closely related to the performance of the tasks (and therefore to the assessment of functionality).At the same time, the rating of the segment indicates that the performance of each segment depends on the quality of a small number of composite movement features.Thus the performance of the segments can be seen as bridging functionality and movement quality.Although clinicians can denote impaired or unimpaired performance they can not quantify accurately and in a standardized manner the level of movement impairment.However, this quantification can be achieved by connecting the I/U (Impaired/Unimpaired) ratings of the clinicians to the quality of sets of kinematic features that can be extracted from low-fidelity signals.The hierarchy shows that any observed impairment in a patient's functionality (task and segment layers) can be related to numerous potential combinations of composite and kinematic features.These combinations are complex and cannot be fully observed in real time by a clinician.
The revealing of the relations of function (as primarily captured by the higher levels of the hierarchy) to movement quality (as primarily captured by the lower levels) requires computational modeling and analysis of the relations between the layers.Since clinicians employ hierarchical probabilistic processes in therapy assessment, the standardized rating of the top three layers by expert clinicians could be used to formulate a Hierarchical Bayesian Model that integrates the clinician rating process with computationally extracted kinematics and captures the relations between all four layers.During our collaborative research with clinicians, we noted that interactive reviewing of video recordings of the movements of patients assists clinicians in the assessment of relations between tasks to segments and composite features [30].We concluded that further advancement of modeling required the collection of video recordings of patient movement and the structured rating of those recordings by clinicians.

A. Participants and Data
To test the generalizability of the HBM we collected videos across two different settings: • dataset 1: upper extremity tasks performed by chronic stroke survivors receiving outpatient therapy in the Emory Hospital; • dataset 2: upper extremity tasks performed by acute stroke survivors performing the ARAT upper extremity assessment test in the Shirley Ryan Ability (SRA) Lab.All captured tasks were seated tasks performed with the impaired upper limb.The participant age group was 65 ±15.All participants successfully passed the required cognitive assessments and demonstrated the ability to follow the clinicians' instructions to perform the tasks.The studies were reviewed and approved by the respective IRBs of the two clinics (Emory University IRB #97319 and Shirley Ryan Ability (SRA) Lab IRB #ST U 00212137).Across the two sets, we captured the movement of patients showing mild and moderate impairment.Since the two clinics use different test to measure impairment (FM at Emory and ARAT at SRA Lab) we used FM and ARAT score correlations defined in the literature ( [34], [37], [38] to establish levels of impairment: mild (FM 48-52, ARAT 43-54); moderate (FM 31-47, .The detailed information of participants for both datasets is presented in Table .I. For Data Set 1, we collected videos of nine stroke survivors performing 10 upper extremity training tasks regularly used in therapy.Each participant attempted each task four times, with three participants returning for repeat sessions to explore within-patient variability.This resulted in 287 usable video recordings across 15 sessions.The movements were recorded using a single low-cost camera, positioned approximately three feet from the impaired limb side.No detailed instructions were given to the clinicians for setting up the camera.This led to variant points of view and a noisy dataset requiring additional pre-processing (as discussed in [18]).For Data Set 2, we collected videos of 13 stroke survivors performing the first 15 tasks of the ARAT, yielding 191 usable video recordings across 13 sessions.The setup involved three fixed cameras capturing ipsilateral, contralateral, and traverse viewpoints as shown in [39].To ensure compatibility between the two datasets and assess the feasibility of training the HBM using low-fidelity data, we exclusively utilized the ipsilateral view videos from both datasets for HBM training.Future work will involve further training with stereo-capture data (see Discussion and Future Work).
For the rating of the collected videos, we developed a custom rating interface, called the Video Application Tool (VAT).The VAT allows clinicians to view and rate each task and then view each segment of the task as a standalone video and thus rate each segment independently.While rating each segment the clinicians also denote which of the composite movement features of the segment show impairment.More details on the development, structure, and use of the VAT for the rating process can be found here [30].The 287 patient videos of Data Set 1 were rated by three clinicians.The 191 videos of Data Set 2 were rated by four different clinicians (different from the clinicians that rated Data Set 1).All videos received two scores from two different clinicians.All rating clinicians had 10 years or more experience in the treatment of stroke.We combined the 287 and 191 videos of data sets 1 and 2 (for a total of 478 rated videos comprising 10 tasks from dataset 1 and 15 tasks from dataset 2) for the expression of the HBM.

B. Mathematical Formulation of HBM
For the formulation of the four-layer HBM, we combine the clinician ratings resulting from the study with computational analysis of the kinematics of the patient movements.We exploit the expert knowledge captured through the rating rubric and clinician ratings to express the layer relationships as conditional probabilities.The conditions are imposed on the prior level of the hierarchy to get a posterior of the immediate lower level.The mathematical notations used to calculate the probabilities between each layer are given in Table II.
By expressing the relationships among the task, segment, and composite layers rated by clinicians we can define a joint posterior probability revealing which changes in movement quality at the composite feature level (C M I y to C M y ) have the highest probability of affecting function (influencing the rating of the segment (S x ), and task (T i ),) as well as the C M I y , S x , T i , r sets where this is more observable.We can then explore the relationship between the most impactful changes in movement features (C M I y to C M y ) to the kinematics data captured through low-cost/low intrusion infrastructure (i.e.video camera and/or wearable sensors).To explore this relation, we first replace the composite features layer with the kinematics layer, thus deriving conditional probabilities relating task and segment performance to kinematics.We next use a novel matrix structure combining the conditional probabilities and compute the relationships between impactful changes in composite features and correlated changes in kinematic features.This makes it possible to use computational analysis of kinematics to automatically detect movement quality issues Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II THE MATHEMATICAL NOTATIONS AND THEIR DEFINITIONS USED IN THE FORMULATION OF HBM
that may be affecting task performance and overall functionality and sets the stage for interpretable automated assessment of rehabilitation movement.
C. Expressing HBM Layers Using Clinician Ratings 1) Task-Segment Relations: To investigate meaningful relations of task (i) and segment (x) execution, we need to calculate conditional probabilities for all the (i, r ) and (x, r ) pairs for any given clinician's rating (r).All the frequencies are calculated using the clinician ratings of videos.In all the equations we use the following conditional probability formula: , and P( Segments that have a higher probability of receiving a 2 when the task receives a 3 (the (x, r = 2), (i, r = 3) pairs) denote segment execution where the movement impairment is not significantly influencing function.The type of segments that have a higher probability of receiving a 2 when the task receives a 2 (the (x, r = 2), (i, r = 2) pairs) denote the segments we would need to focus on in terms of movement changes that affect function.Using Eqn. 1 we calculate: A high probability of a task receiving a r ≥ 2 when a segment receives a r ≤ 1 rating probably denotes significant compensation during that type of segment in a way that negates the full use of the affected limb.For example, the "Close a lid" task is fully executed primarily by transporting the doorknob-shaped object on top of the cone-shaped object during the (M&TR) stage.A high probability of a segment receiving a r ≥ 2 rating when a task receives a r ≤ 1 rating denotes that this type of segment is not the one causing the challenge with this task and directs us to find the challenge in other segments.A high probability of a task receiving a r ≤ 1, when a segment receives a r ≤ 1 rating, denotes significant movement impairment across that type of segment and other segments resulting in incomplete task execution.Therefore, the relation of each segment to a task is not fully observable.
2) Segment-Composite Feature Relation: The relationship between the segment layer (S x ) and the composite movement feature layer (C M y ) indicates which composite features contribute to the segment's ratings.This probability space is divided into three categories of completion-impairment relations.
a) Completed segment execution with movement quality impairment, P(C M I y |S x,r =2 ): We calculate this by: Here, the frequency of any particular impaired composite features, C M I y indicates the total number of times that particular feature y was noted by the clinician as impaired when assessing the execution of a type of segment.To show this as a ratio of the total times a feature is available for noting as impaired, we can rewrite the equation as: b) Incomplete impaired execution, P(C M I y |S x,r =1 ): We can similarly calculate the probability of incomplete execution of a segment as being related to the impairment of a specific composite feature using the frequencies of each incident.c) Complete unimpaired execution, P(C M y |S x,r =3 ): We can again use (3) to calculate the contributions of composite features to the complete and unimpaired execution of a segment.In this case, the frequency of any particular unimpaired composite features, C M y indicates the total number of times that a particular feature, y was available for observation, but was not noted as impaired by the clinician.

D. Increasing Granularity of HBM Movement Quality Layers Through Computational Experimentation
We calculate differences between unimpaired and conditional impaired executions using the formula δ P c f = abs(P(C M y |S x,r =3 ) − P(C M I y |S x,r =2 )) for each composite feature, estimating their impact in differentiating between unimpaired and impaired segment executions.Adding a kinematics layer below the composite features layer and connecting it to the composite features, we enhance the granularity of movement quality analysis, facilitating a clearer understanding of the relationship between movement quality and functionality.We have updated the list of 20 kinematics (supplementary document) from our prior work [2], [18] that are critical to analyzing the relation of movement quality to functionality for rehabilitation of the upper extremity in stroke.However, there is no existing direct mapping between these kinematic features and the composite movement features that clinicians use to assess movement quality and performance in standardized segments.To establish these important connections, we first repeat the delta calculation we described for composite features for each of the 20 computationally extracted kinematics.We consider all the probability sub-spaces where segment Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.ratings are different, and for each rating, we compute the conditional feature mean, F using: Here, µ k f is the sample mean of the kinematic feature, k f per segment, S x,r .We calculate the mean per observation and then add all the µ ′ s for n number of occurrences of the case S x,r .The F notation separates the conditional probability, P from the conditional feature mean calculated using ( 5).This way we calculate all the conditional segment-kinematic feature probabilities for all the cases.Like δ P c f , we can calculate δ )).The δ P k f signifies the kinematic features that change significantly between the impaired and unimpaired execution of segments.This method enables us to correlate segment performance with variations in kinematic patterns, identifying specific alterations in kinematic features that might notably impact the execution of a particular segment type.Using both pipelines (kinematic and composite) we can formulate this δ P to identify kinematic and composite components that affect the execution of segments given different task and segment rating combinations.Since both the composite movement features and kinematic features deltas have the task and segment layers as a common conditional reference (segments), we can multiply them to get a (k f, y) matrix to define the relationship between composite features and kinematics, IV. RESULTS This section is divided into three subsections: i) presentation of HBM formulation results ii) use of the HBM results to reduce uncertainty in clinician scores, and iii) presentation of HBM formulation results that can provide automated recommendations to clinicians.

A. Expression of Four-Layer HBM
The resulting HBM establishes a conditional connection between the task, segment, and composite layers of clinicians' ratings and the kinematics layer.For instance, Fig. 3   and 4 depict the relationships δ P c f and δ P k f , respectively, demonstrating the conditional probabilities of the two HBM pipelines when both the task and the segment are rated 2. Figure .3 shows the relationship between the 9 composite movement features and the 4 segments, while Fig. 4 illustrates the relationship between the 20 kinematic features and the 4 segments.The color bar indicates normalized δ P with the brighter color indicating a higher probability of the movement quality feature (a composite feature and a kinematic feature) influencing the performance (and thus the rating) of the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
segment and task.For instance, in Fig. 3 the composite feature "DP" has a high probability of reducing the MTR segment's rating from 3 to 2. Similarly, "Object velocity" has a high probability of impacting the performance of the MTR segment as evident from Fig. 4. To comprehensively profile this effect we calculated the absolute difference from the unimpaired performance mean for both pipelines.We can then calculate the H B M (k f, y) relations of the 20 kinematic features with the key composite features of each segment for each of the four segments for all different conditions of Task/Segment ratings.In Fig. 5, we demonstrate two task/segment rating conditions for segment MTR: Fig. 5 (A) shows the case for H B M given S x,r =1 and T i,r =1 and Fig. 5 (B) shows the case for H B M given S x,r =2 and T i,r =2 .The brightness of the color of each box denotes the distance of the mean from an unimpaired performance (S x,r =3 and T i,r =3 ) for that particular kinematic within the MTR segment.We can thus see that for the TS (Trunk Stabilization) column in the T1S1 case (5 (A)) there are more kinematic features that are further away from their T3S3 mean value than in the T2S2 case.All composite feature columns are different for the T1S1 and T2S2 cases since the magnitude of the distances is different.We can thus establish different composite to kinematics column profiles for each task/segment rating condition for each segment.

B. Using the HBM to Reduce Uncertainty
Having calculated the probabilities of the full four-layer HBM (from tasks to kinematics), we can use the HBM to disambiguate clinician ratings.We will explain this with the following example.In Fig. 6, we demonstrate a case where for the same video of a patient performing a specific MTR segment one clinician marks the "TS", "DP", and "SAT" features as impaired and another does not.Both clinicians rate the overall task as a 2 but the clinician who marks the three composite features as impaired rates the MTR segment as a 2 whereas the clinician who does not mark any composite features as impaired rates the segment a 3. Based on the task-segment rating combination this becomes an ambiguous case between T 2 S 3 and T 2 S 2 .We disambiguate the rating with the following steps: • The first step is to calculate the mean, µ ts y of H B M kinematics for each of the composite features for both T 2 S 3 and T 2 S 2 cases.For example: from Fig. 5 (B), the mean of "TS", "DP", and "SAT" features can be calculated by taking the average of the kinematics column for each feature.
• In the second step, we calculate the mean of the patient's kinematics, µ p for the data in question.
• The last step is to calculate the absolute difference between the two means, d ts y = abs(µ ts y − µ p ).If the majority of the d ts y are closer (small difference) to the same t −s combination then that t −s combination would be the selected task-rating combination.For the example shown in Fig. 6, for all composite features, d ts y is smaller for the T2S2 combination, thereby resolving the ambiguity.Across the two datasets, we identified 98 instances where a task was rated 2 by two clinicians, but a segment Fig. 6.Resolving ambiguity between two cases using the absolute difference of patient kinematic mean and the conditional kinematic mean from the HBM.The resolved combination is T2S2 Since the difference is closer to combination.
of that task was 2 by one clinician and 3 by the other clinician.Utilizing the HBM probabilities and the aforementioned steps, we resolved 95% of these cases.Specifically, out of these 98 cases, 67 were confirmed as 2s, and 26 were confirmed as 3s.However, the other 5% cases resulted in d ts y values where the majority towards a specific t −s combination was not achieved.

C. Using the HBM to Produce Automated Assessment Recommendations for Clinicians
As we can remove the ambiguity between the clinicians' ratings, the four-layer HBM also holds the potential to generate automated assessment recommendations for clinicians.We will leverage the detailed description of the results given above to describe the steps for calculating automated ratings.The first step is to discern the most impactful kinematics for each of the composite features from H B M probabilities for all t − s combinations and calculate the means, µ ts y for these kinematics.For example: from Fig: 5 (A) and (B), we need to calculate which cluster(s) of kinematics have the more significant effect in moving the segment and task score from a T1S1 to a T2S2.To identify a meaningful list of kinematics, we first need to put appropriate weights on the mean of the kinematics.While there could be multiple ways of producing weights, we adopt an approach that readily leverages the HBM connections of movement features to kinematics.Specifically, we use an unsupervised clustering algorithm to cluster the kinematics into three categories based on the correlation probability, H B M (Fig. 5) of each kinematic element with each composite feature.The cluster centroids are then used as weights for the kinematic elements.For example, In Fig. 7, we show three different clusters of kinematics formed for the "smoothness and accuracy trajectory" composite feature using our weight generation approach.We denote the weights Fig. 7. Visual representation of the three different clusters formed for "smoothness and accuracy of wrist movement".The y-axis shows the normalized HBM probabilities.The colored lines represent the centroids of the cluster that can be used as weights in the automatic assessment pipeline.
as w k f,y,t−s for each kinematic element (k f ) per composite feature (y) per task-segment combination (t − s).This step is carried out for every correlation of each of the 20 kinematic features with each of the 9 composite features as shown in (6) where H B M ∈ R N ×M .Using the weights, we calculate the most impactful kinematics for each composite movement feature, y.We use eqn.7.
For the second step, we calculate the mean of the impactful kinematics, µ p and µ ts y using patient data and HBM probabilities for the segment and task instance that needs to be automatically rated.The last step is to calculate the absolute difference between the two means, d ts y = abs(µ ts y −µ p ).If the majority of the d ts y are closer to (have a small difference from) a specific t − s combination then that t − s combination would be the selected task-segment rating combination.
As discussed in section B of the Results (disambiguation of clinician ratings) in over 90% of the cases the mean distance of the kinematics is closer to a specific t −s combination.Thus there is significant promise to using the HBM to automatically calculate ratings.However, there exist some cases as mentioned again in section b, where the majority of the kinematic means do not tend towards a specific t − s combination.In those cases, n y,t−s > T I x,i condition has to be satisfied, where n y,t−s indicates number (n) of composite features, y for a particular task-segment combination, t −s.The threshold T I x,i (for showing a movement feature as impaired and reducing the segment scores respectively) will be different given any task-segment (x, i) combination.To determine this threshold, we intend to conduct a study with clinicians described in future work.

V. DISCUSSION AND FUTURE WORK
The four-layer HBM facilitates a statistical quantification of the relationship between function and movement quality and provides novel quantitative insights into this important relationship.The magnitude of a specific movement quality impairment can be calculated reliably through the analysis of relevant kinematics.However, the calculation of the effect of any movement quality impairment on the recovery of functionality requires the consideration of the statistical relations of the type and magnitude of the movement impairment being observed with a) the type of task/segment being performed and b) the state of other elements of movement quality during that segment/task performance.Furthermore, the potential effect of a movement quality impairment on the recovery of functionality should be understood as a probability rather than an absolute value to account for differences between patients.
Once fully automated, the four-layer HBM can be used by researchers and clinicians to calculate these conditional probabilities.If a clinician or researcher observes some type of movement impairment in the movement of a patient (including different kinds of compensation), they can record a video of the patient's movements and annotate the type of task being performed and the movement impairment being observed.The computational HBM presented in this paper can be used to extract the kinematics and the kinematics clusters related to the task/segment combination being performed and the movement impairment being observed.The system can then calculate whether the movement impairment being observed will not affect functional recovery (the movement would receive a score of 3 and 3 at the task and segment level respectively), have minimal effect (3/2 score), have some effect but without impeding the performance of the task segment (2/2 score) or significant effect (2/1 or 1/1 score).In the cases where the impairment will produce a rating of 2 or below, the clinician may consider addressing the impairment through appropriate exercises since the noted impairment has a high probability of interfering with full functional recovery.Conversely, video recordings of unsupervised performance of specific tasks by patients can be used to automatically calculate the types of movement quality impairments for each patient that have a high probability of producing lower task/segment scores and potentially interfering with full functional recovery.This information can again be used by clinicians to structure effective Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
therapy that addresses improvements in movement quality that can have a meaningful effect on functional recovery.Post therapy videos can then be analyzed by researchers to quantify the effect of the therapy on improving movement quality in the manner that effects functionality.
It is important to note that the statistical approach of the HBM is a key reason for its robustness.The mean, variance, and distance from the unimpaired mean for all kinematic features are calculated across all patients participating in the study (mild and moderate, chronic and acute).The relation of kinematics to clinician's ratings is again calculated across all patients and the ratings of the six different rating clinicians.The relations of kinematics to ratings are calculated through weighted clusters of kinematics rather than through oneto-one relations between individual kinematic features and ratings.This allows the capturing of the interrelations of the different kinematic features with particular task/segment ratings.Therefore, we can associate different profiles of the 20 key kinematic features with particular task/segment score combinations with significant confidence (> 90%).Since the HBM relies on detailed kinematics of patient movement and compares these kinematics to the kinematics of functional performances by patients (performances rated 3) it is only applicable to the analysis of movement of moderate and mildly impaired patients.These patients can partially or fully perform many types of tasks and have the potential to reach functional performance for most of these tasks.
The datasets used so far for the expression of the HBM are limited.This constrains the detailed expression of the HBM and the immediate and effective use of the HBM for automated assessment.We used only 478 rated videos for the training.We only used one camera view (ipsilateral to the impaired limb), limiting the detailed calculation of some of the kinematic features.The constraints of the datasets can be seen in the kinematics for the IP column of Fig. 4 and in the hand-related rows (last five rows) of Fig. 5.The IP column in Fig. 4 has too many kinematic elements with high probability to influence the performance of the IP segment and the hand rows of Fig. 5 all have low probabilities.Larger data sets, captured in a manner that allows 3d calculations of complex kinematics (like torso rotation and hand aperture), will produce better-distributed probabilities.We are in the process of collecting more data through a 3-camera setup (ipsilateral to the impaired limb, contralateral and transverse).
However, as discussed in more detail in the introduction, there is a trade-off in the capture of single vs multi-camera data.Our mono-camera video captures for this publication did not require any special preparation for the patient nor a dedicated space with a special setup.It only required the clinician to place a camera on the ipsilateral side and start and stop recording.We were thus able to capture the patient's movement during actual therapy sessions with minimal interference and the capture was well accepted by the patients and clinicians.Our ongoing capture with 3 calibrated cameras requires a fixed camera set up in a dedicated room and requires the clinician to perform a short calibration before recording.Since the facility being used integrates research with clinical services this more cumbersome capture is accepted.However, it may not be accepted in a standard rehabilitation facility with limited space and resources.We plan to explore whether a combination of a one-camera capture with low-cost wearables may provide a better solution for capturing 3-D kinematics during therapy with minimal interference.
In the future, we want to utilize the more expansive and multicamera data to enhance the HBM's comprehensive expression and conduct a more in-depth analysis of its components.We will then use the expanded HBM to generate automated composite, segment, and task ratings for all captured tasks.We will present the automated ratings along with the videos to a group of different clinicians and ask the clinicians to confirm or edit the ratings and provide annotations for edits.We will use the clinician responses to calculate the weights for the different clusters of kinematics thus allowing for more accurate automated ratings and an even more detailed expression of the HBM.We expect the improved HBM to also perform better in the disambiguation of clinician ratings.At that point, we will be able to provide robust automated assessment recommendations of upper extremity tasks that are interpretable since they will connect types and levels of movement impairment to assessments of function.We can also start to share the patient kinematics, clusters of kinematics and composites per task-segment combination, weights of the clusters, and the HBM probabilities per tasksegment combination.Then other researchers in the field can start to replicate our results, tune the HBM expression and cluster weights with new data, and expand the HBM to other contexts of neuro-rehabilitation.

VI. CONCLUSION
The proposed HBM can provide robust interpretable recommendations that can standardize automated assessment by identifying impactful clusters of movement quality.Automated assessment in the clinic can release more time for clinicians to focus on delivering therapy.Clinicians can use automated summaries of therapy task performance at home, along with quantitative identification of relations between movement changes and function, for remote decision-making in structuring therapy and providing remote feedback to the patient.The quantification of movement quality changes and their effect on functionality can help support the clinicians' decision processes for structuring therapy.Finally, automated assessment recommendations can also be used for the training of clinicians in standardized assessment.

Fig. 1 .
Fig. 1.Layers of the Hierarchical Bayesian Model (HBM).Higher layers relate more to function and lower layers relate more to movement quality.

Fig. 2 .
Fig. 2. State-machine representation of the movement segments and their possible sequences in upper extremity rehabilitation tasks.

Fig. 3 .
Fig. 3. Demonstrating the delta probability composite features (y-axis) to influence the performance of impaired segments (x-axis); brightest intersections have the highest probabilities.

Fig. 4 .
Fig. 4. Demonstrating the delta probability kinematic features (y-axis) to influence the performance of impaired segments (x-axis); the brightest intersections have the highest probabilities.

Fig. 5 .
Fig. 5. Demonstrating the normalized ∆ HBM probabilities for MTR segment given different task-segment rating combinations; (A) task and segment both rated 1; (B) Task and segment both rated 2.

TABLE I PATIENT
INFORMATION FOR THE TWO DATASETS