Keywords

1 Introduction

The automated measurement of movement qualities revealing expressive intentions, emotions, and non-verbal social signals (e.g., leadership and entrainment) is of paramount importance in many applications (Argyle [1], Bieńkiewicz et al. [7], Camurri et al. [9], Karg et al. [18], Meeren et al. [23]). An important role in understanding human movement is played by the so-called perceived Origin of Movement (OoM), i.e., the part of the body perceived by an external observer as the joint from which movement originates (Kolykhalova et al. [19], Matthiopoulou et al. [21, 22]). In cognitive/motor rehabilitation, the detection and tracking of the (perceived) OoM can help support a patient in learning how to correctly perform a specific movement (e.g., how to get up safely from a chair), reducing the risk of incurring injuries. Moreover, the diagnosis of the origin of a reaching movement is very useful for an individualized rehabilitation of a person with a stroke (Bakhti et al. [4, 5]). In dance and music teaching, the awareness and discovery of the OoM can contribute to increased effectiveness of expressivity and repeatability of a technical gesture. In sport and entertainment, it can enhance performance.

Research on the OoM is grounded in movement science and biomechanics, particularly in the literature related to the so-called Leading Joint Hypothesis (LJH) on limb motion, according to which “there is one leading joint that creates a dynamic foundation for the motion of the entire limb” (Dounskaia [11])Footnote 1. The basis of the LJH is found in the way according to which the central nervous system exploits the biomechanical properties of the limbs for movement organization. The automated detection of the OoM was recently investigated by Kolykhalova et al. [19], who proposed an algorithm, inspired by the LJH, based on a suitably defined skeletal representation of the human body as a graph. The central idea of that algorithm consists of clustering the graph according to the similarity in the values assumed by a suitable movement-related feature (e.g., speed) on its vertices (which are suitably selected joints of the human body). As the specific clustering technique, spectral clustering (Shi and Malik [24]) is used. The clusters so found are then exploited to construct a cooperative game model on an auxiliary graph, having the same vertex set as the original graph, and edges connecting vertices at the boundary between any two different clusters in the original graph. Then, the Shapley value (a measure of the importance of players in a suitable class of cooperative games, see Maschler et al. [20]) is used to find the most relevant vertex, deemed to be the OoM. In the specific case, the Shapley value coincides with weighted degree centrality (Deng and Papadimitriou [10]) on the auxiliary graph. It is worth noting that both the LJH and the algorithm developed by Kolykhalova et al. [19], based on unsupervised machine learning, appear to be closely connected to the following concept already expressed by Aristotle [2]: “the origin of movement […] remains at rest when the lower part of a limb is moved; for example, the elbow joint, when the forearm is moved, and the shoulder, when the whole arm; the knee when the tibia is moved, and the hip when the whole leg.” In other words, it looks quite natural to search for the OoM within a subset of joints that connect clusters with different motor behaviour, i.e., joints belonging to the boundary between any two such clusters.

In the algorithm developed by Kolykhalova et al. [19], the clustering step is performed frame-by-frame, imposing no relationships on the clusters found in successive framesFootnote 2. So, in case one wanted to label (or “colour”) the clusters in order to visualize their evolution with respect to time, any permutation of the labels in each frame would be admissible (i.e., it would not change the Shapley value). This would make the resulting visualization difficult. In this work, we improve the visualization of the clusters generated by the algorithm in successive frames, by colouring them by means of the resolution of a sequence of minimum cost bipartite matching subproblems. In each subproblem, the labels of the clusters found in one frame are connected in a suitably “smooth” way to the ones of the clusters found in the successive frame, by maximizing the summation of the overlaps of the sets of vertices in common with any two clusters that are coloured in the same way (or equivalently, in order to reformulate this optimization subproblem as a cost minimization subproblem, by minimizing the opposite of such a summation plus a constant). The method is inspired by the curve colouring problem investigated, for a different application to metamaterial analysis, by Bacigalupo et al. [3]. In that problem, a finite set of curves is observed at each time instant, and one has to attribute each observed point to a specific curve, using a different “colour” for each curve, in such a way as to reconstruct the curves in the smoothest possible way. Finally, based on a real-world dataset, we show that the proposed modification of the output visualization of the algorithm developed by Kolykhalova et al. [19] provides, as expected, a better visualization of the clusters than its original version.

The article is structured as follows. Section 2 summarizes the algorithm developed by Kolykhalova et al. [19] for the automated detection of the perceived origin of full-body human movement. Section 3 describes the proposed cluster colouring method, aimed at improving the visualization of the output of that algorithm. Section 4 compares the cluster visualizations obtained, respectively, by the original algorithm and by its proposed modification. Section 5 concludes the work with a discussion, delineating its possible developments.

2 An Algorithm for the Automated Detection of the Perceived Origin of Movement

In this section, we briefly describe the algorithm for the automated detection of the (perceived) OoM, which was developed by Kolykhalova et al. [19]. Its main steps are reported in Fig. 1 and are summarized in the following paragraphs. The reader is referred to that reference and to Matthiopoulou et al. [22] for a more detailed presentation of the algorithm and for a discussion on its implementation details. In the following section, focus is given to the output of Step ii) of the algorithm, for which an improved visualization is proposed in the present work.

Fig. 1.
figure 1

Main steps of the algorithm developed by Kolykhalova et al. [19] for the automated detection of the perceived OoM.

  1. i)

    A weighted undirected graph \(G=(V,E,w)\) is built, with the aim of modelling the human body through its suitable skeletal representation. Here, \(V\) denotes the vertex set of \(G\), \(E\) denotes its edge set, whereas \(w\) represents a weight function defined on \(E\), constructed based on data acquired through Motion Capture (MoCap) techniques. The vertices of \(G\) form a subset of the set of all body joints. Its edges are further classified into physical/non-physical edges. For each frame, every edge is labelled with a non-negative weight. This is proportional to the current similarity of the values assumed by a given movement-related feature (e.g., speed) at each of the two vertices associated with such an edge. In the case of a non-physical edge, the constant of proportionality is chosen to be much smaller than the one used to define the weight of a physical edge, since the former edge models a more temporary movement-related similarity, originating from the specific movement performed.

  2. ii)

    For each frame, the weighted undirected graph \(G\) is clustered by applying spectral clustering to the set of weights assigned to its edges. The number of clusters is optimized automatically. Labels are assigned automatically to the clusters (but not optimized), as a by-product of the specific spectral clustering algorithm used (Shi and Malik [24]).

  3. iii)

    For each frame, a suitable weighted auxiliary graph \({G}^{aux}=(V,{E}^{aux},{w}^{aux})\) is built. Its vertices are the same as the ones of the original graph \(G\). In contrast, its edge set \({E}^{aux}\) is a subset of the set of physical edges of \(G\), that also connect vertices belonging to different clusters of \(G\). Each edge in \({G}^{aux}\) is labelled with a weight that is proportional to the dissimilarity (rather than the similarity) of the values assumed by the given movement-related feature on its two associated vertices.

  4. iv)

    For each frame, a cooperative Transferable Utility (TU) game is constructed, based on the weighted auxiliary graph \({G}^{aux}\). The players of this game are the vertices of \(G\) (or, which is the same, of \({G}^{aux}\)). The value \(c\left( {V^{\prime}} \right)\) of any coalition \(V^{\prime} \subseteq V\) is defined as the summation of all the weights (in the weighted auxiliary graph \({G}^{aux}\)) associated with the physical edges belonging to the subgraph of \({G}^{aux}\) that is induced by \(V^{\prime}\).

  5. v)

    For each frame, the Shapley value for the cooperative TU game built in Step 4 is evaluated. For each player, it represents the average marginal contribution of that player when joining a randomly formed coalition. Hence, the Shapley value is used to rank joints according to their “importance” or “centrality” in the weighted auxiliary graph \({G}^{aux}\), where the “most important/most central” joint in a frame is one that has the largest Shapley value.

  6. vi)

    Finally, a filtering step is performed, keeping only the vertices automatically detected as being the “most important/most central” ones for a given number of successive frames.

3 Proposed Cluster Colouring Method

The output of Step ii) of the algorithm summarized in Sect. 2 is a set of clusters (subsets of vertices of the graph \(G\)). Visualizing such clusters (by attributing a “colour” to each of them) can be useful to better understand the output of that algorithm, since the joint deemed to be the OoM (the one with the largest Shapley value) belongs to the boundary between two such clusters. However, in the original algorithm, no colour is assigned explicitly to each such cluster. If colours were assigned based, e.g., on the order of the clusters produced by the specific spectral clustering algorithm used, it may happen that, when moving from one frame to the successive one, the clusters did not change, but their colours were permuted, making the visualization difficult. For instance, in this case a visual inspection would likely fail to detect a possible relationship between a change of the OoM and a simultaneous change in the composition of the clusters.

In this section, we propose a method to colour the clusters generated by the algorithm of Sect. 2 in a “smooth” way, avoiding situations such as the one described above. For simplicity, we assume that the number of clusters does not change between two consecutive frames, say, respectively, at times \(t\) and \(t+1\). Taking the hint from the curve colouring problem considered by Bacigalupo et al. [3], starting from the colours assigned to the clusters at time \(t\), we attribute colours to the clusters at time \(t+1\) by solving a minimum cost bipartite matching subproblem, or assignment subproblem (Burkard et al. [8]). In other words, first we construct a complete weighted bipartite graph \({G}^{bipartite}=(R\,\cup\,B,{E}^{bipartite},{w}^{bipartite})\), where \(R\) is a set of “red” vertices, \(B\) is a set of “blue” vertices, the two cardinalities \(\left|R\right|\) and \(|B|\) are the same, \({E}^{bipartite}\) is the Cartesian product \(R\times B\), and \({w}^{bipartite}: R\times B\to {\mathbb{R}}\) is a cost function. We recall that a bipartite matching \({{M}^{bipartite}\subseteq E}^{bipartite}\) is a subset of edges such that every vertex in \(R\,\cup\,B\) is incident to at most one edge in \({M}^{bipartite}\). Moreover, the matching is perfect if every vertex in \(R\,\cup\,B\) is incident to exactly one edge in \({M}^{bipartite}\). The cost of the matching has the expression \(C={\sum }_{\left(r,b\right)\in M^{bipartite}}{w}^{bipartite}(r,b)\). The objective of the minimum cost bipartite matching problem is to find a perfect matching in the complete weighted bipartite graph \({G}^{bipartite}=(R\,\cup\,B,{E}^{bipartite},{w}^{bipartite})\), having minimum cost \(C\). Various efficient algorithms exist to solve such an optimization problem, e.g., the Hungarian method (Burkard et al. [5]). For a small cardinality \(m:=\left|R\right|=|B|\) (e.g., \(m=4\) or \(m=5\)) the problem can be easily solved even by the brute-force method, since its number of admissible solutions is \(m!\).

In our specific case, we choose \(R\) as the set of \(m\) clusters obtained for the graph \(G\) at time \(t\) by means of Step ii) of the algorithm described in Sect. 2Footnote 3, \(B\) as the set of clusters obtained for the graph \(G\) at time \(t+1\) by means of the same step, whereas \({w}^{bipartite}\left(r,b\right):=\left|V\right|-|r\,\cap\,b|\). In other words, the larger the overlap between two clusters \(r\) and \(b\) at times \(t\) and \(t+1\), respectively (in terms of the number of common vertices), the smaller the cost of the weighted edge \(\left(r,b\right)\) in \({G}^{bipartite}\). In the particular case in which the two sets of clusters are the same, the optimal bipartite matching preserves the colours of the clusters when moving from time \(t\) to time \(t+1\), thus preventing the occurrence of the undesired situation illustrated at the beginning of this section.

4 Results

In this section, we compare the output visualization of the algorithm developed by Kolykhalova et al. [19] and the one obtained by its proposed modification, by considering an illustrative example. The dataset we used was recorded with 13 infrared cameras in March 2016 in the framework of the H2020-ICT-2015 EU Project WhoLoDance. The subjects were two professional dancers, equipped with 64 infrared reflective markers, 5 accelerometers, and 1 microphone, performing contemporary dance movements without music accompaniment, as the latter could have affected the way the dancers performed the movements. The 64 markers’ trajectories were tracked by the Qualisys Track Manager (QTM) software and manually interpolated with the same software when markers went missing due to visual occlusion of the minimal set of cameras needed for their tracking. In addition to the video recordings, there were also manual expert annotations regarding which joint was evaluated to be the perceived OoM. Starting from the full marker set, we constructed a smaller set made of 20 joints by means of the reduction of sets of multiple markers into individual joints. The position of each joint was determined by averaging the positions of the markers belonging to a suitable subset associated with such joint, according to the map reported in Fig. 2. For instance, the positions of the 5 markers on the head in the original full-body skeletal structure were used to determine the position of the joint numbered as 20 (head) in the reduced skeletal structure. Then, in order to find the clusters, the algorithm of Sect. 2 was applied based on this reduced skeletal structure. Specifically, the angular momentum of each joint with respect to the center of mass of the body was selected as the movement-related feature used by that algorithm (see Matthiopoulou et al. [22] for a description of this feature and of the specific measure of similarity adopted).

Fig. 2.
figure 2

Mapping from the original full-body skeletal structure (a) to the reduced one (b). Each marker (joint) in the second subfigure corresponds to a group of markers in the first subfigure. Note: “left” and “right” in the second subfigure refer to the subject’s viewpoint.

We considered an example in which the dancer started from a standing position with the right leg raised off the ground and shifted slightly to the left. From this position, the dancer began to rotate the right leg counterclockwise, almost as if attempting a pirouette, which also compelled the torso to rotate. As seen in Fig. 3, we investigated how the cluster colouring changed between two successive frames, first using the algorithm developed by Kolykhalova et al. [19], then using the proposed modification of its output visualization. In order to enhance the visualization, the two frames shown were not consecutive, i.e., there were other frames interposed between them. Moreover, for a fair comparison, the two cluster colourings were initialized in the same way. In the first case, we can clearly observe that some joints initially included in a cluster later belonged to different clusters. Indeed, most cluster colours were swapped when moving from the first frame to the second frame. In the second frame, only the head and the shoulders maintained the same colour likewise in the first frame, while the set of joints composing the red, blue and green clusters changed completely with respect to the first frame. However, when using the proposed modification, the green and blue clusters remained unchanged compared to the previous frame, and both the head and the shoulders retained their previous colours. Hence, from this simple example, it becomes clear how the proposed modification has the ability to enhance the visualization of the clusters over time, by avoiding continuous occurrence of seemingly random switching.

Fig. 3.
figure 3

Cluster colourings obtained for two successive frames, respectively, by: the algorithm developed by Kolykhalova et al. [19] (first row); its proposed modification (second row). Subfigures (a) and (c) represent the (same) cluster colouring obtained in the first frame, as an initialization step; subfigure (b) represents the cluster colouring obtained in the second frame by the first algorithm; subfigure (d) represents the cluster colouring obtained in the second frame by the proposed modification.

5 Discussion

In this work, the algorithm for the automated detection of the perceived Origin of full-body human Movement (OoM), proposed by Kolykhalova et al. [19], has been further developed, by improving the visualization of its output through a suitable cluster colouring method. It is worth noting that, differently from the similar curve colouring problem considered by Bacigalupo et al. [3], no further improvement could be obtained by reformulating the cluster colouring problem as a multi-stage optimization problem, in which each frame (stage) is associated with the cost of the bipartite matching between the set of clusters obtained in that frame and the one obtained in the successive frame. This problem could be solved, in principle, by dynamic programming (Bertsekas [6]). However, since the optimization subproblems per stage (i.e., the minimum cost bipartite matching subproblems) are actually decoupledFootnote 4 (apart from a permutation of the set of labels), the cluster colouring obtained by solving such a multi-stage optimization problem would be identical (again, apart from a permutation of the set of labels) to the one obtained by the method proposed in this work.

The proposed cluster colouring method could be used in conjunction with different movement-related features (such as the ones considered by Matthiopoulou et al. [21, 22] in a further alternative extension of the algorithm proposed by Kolykhalova et al. [19] and in its applications) in order to visualize which feature is the best at capturing the OoM. Moreover, the improved visualization could help to identify in which situations (e.g., for which gestures) the algorithm itself performs well or, vice versa, fails to correctly identify the OoM.

It is worth remarking that a limitation of the proposed cluster colouring method is that it can be applied only when the number of clusters does not change with time. This could limit its application to movements characterized by a constant spatial scale. A possible way to overcome this issue could be to use a hierarchical version of spectral clustering, matching the numbers of clusters in any two successive frames. Moreover, instead of performing (spectral) clustering frame-by-frame and solving successively a sequence of cluster colouring problems, one could apply clustering directly to a single “large” graph that represents a set of successive frames, then obtain the clusters per frame simply by “sectioning” the clusters so obtained (Fukumoto et al. [13]). As a successive step, one could make an arbitrary selection for the colours attributed to the clusters of the “large” graph, then make the clusters per frame inherit the colours from the corresponding clusters in the “large” graph. As a by-product of this procedure, also the number of clusters per frame would be chosen automatically (depending on the number of clusters active in each frame). However, this alternative approach could slow down significantly the clustering process, preventing the automated detection of the perceived OoM in real-time.

Finally, it is worth mentioning that other improvements are still possible for the algorithm proposed by Kolykhalova et al. [19] for the automated detection of the OoM. For instance, following the framework of learning with constraints/boundary conditions (Gnecco et al. [14,15,16]), one could include biomechanical constraints in that algorithm, which could modify the sets of clusters taken as inputs by the cluster colouring method proposed in the present work. Moreover, suitable dimensionality reduction techniques (see, e.g., Fantoni et al. [12] and Gnecco and Sanguineti [17]) could be applied when moving from a skeletal structure characterized by a large number of markers to a reduced skeletal structure (see Sect. 4), used as input for the algorithm for the automated detection of the OoM.