HMM-based model for dance motions with pose representation

This paper presents a model for human dance motions based on hidden Markov model. The whole dance is defined as sequences of several finite distinct gestures. Dance gestures are cast as hidden discrete states and phrase of dance as a sequence of gestures. In order to map the skeleton motion data to a smaller set of features, an angular skeleton representation of the human pose is also designed, for recognition robustness under noisy input of 3D sensor. A pose of dance is defined by this angular skeleton representation which can be quantified based on range of movement for discrete hidden Markov model.


I. INTRODUCTION
Motion analysis and classification is of high interest in a variety of major areas including robotics, computer animation, psychology as well as the film and computer game industries.Dancing is a quintessential form of human body motion which has aesthetic values.In a number of emerging applications, the understanding of human dance motion plays a key role.The applications related to human dance motion range from the production of natural gaits for bipedal humanoid robot; indirect augmented reality during dance performance; to image segmentation and computer animation.The interaction using human dance motion has been studied as an alternative form of human-computer interface by a number of researches.
Real-world processes generally produce observable outputs which can be characterized as signals.Broadly one can dichotomize the types of signal models into the class of deterministic models, and the class of statistical (or stochastic) models [1].In this paper, human dance motions are modeled by stochastic model due to the dance can be well characterized as a parametric random process, and that the parameters of the stochastic process can be determined in a precise manner.
Dance can be defined as sequences of several finite distinct gestures.Gesture has two aspects of signal characteristics : spatio-temporal variability and segmentation ambiguity [12].Spatio-temporal variability is fact that the same gesture varies dynamically in size, shape and duration; from the different gesturer or even from the same gesturer.The segmentation ambiguity problem concerns how to determine when a gesture starts and when it ends in continuous signal trajectory.Major approaches for analyzing spatial and temporal patterns include Dynamic Time Warping (DTW), Neural Networks (NNs), dan Hidden Markov Model (HMM) [12].In this study, HMM-based approach is chosen to model the dance gesture, because it can be applied to analyzing time-series with spatio-temporal varibilities and can handle undefined patterns [12].HMMbased dance gesture modelling make us enable to build practical systems that has ability to learn, predict, and classify the dance gestures to analyzes the whole dance.
Dance movements that involve a lot of body articulation will result in a very large input dimension for the processing systems.The input dance motion signal such as skeleton joint trajectories captured by 3D sensor is very likely to experience a discontinuity, noise, or instable parameter [5].In order to analyze the skeleton joint trajectories from 3D sensor, it is necessary to build a representation to reduce the signal entropy and the dimension of data.It must also deal with changes in the dancer's position and orientation relative to the 3D sensor.

A. Related Works
Dance choreography has been captured using various formalization approaches, e.g., Laban notation which is initiated in the early 20 th century.Aristidou, A. Chrysanthou and their colleagues propose a method that can automatically extract motion qualities from dance performances in terms of Laban Movement Analysis (LMA) [14].
Amy Laviers modelled the motion patterns of ballet as a series or event-driven poses that takes the form of a finite automaton [2].For a system involving two legs without violating the laws of physics or the rules of ballet, it take the Cartesian composition.Amy Laviers also built automatic generation of Ballet phrases using Linear Temporal Logic and Computation Tree Logic as rich motion specification languages for robots' movements [3].Yaya Heryadi [4] built a syntactical modeling and classification for performance evaluation of Bali traditional dance, adapting the model of skeleton feature descriptor from Michalis Raptis [5].Dance's pose is represented by spherical coordinate parameter θ, φ from several skeleton joints that is clustered as torso frame, firstdegree joints, and second-degree joints.
F. Ofli, C. Canton-Ferrer and their colleagues analyze the relations between the music and the body movements.The body motion synthesis system will take an audio signal as an input and produce a sequence of body motion features, which are correlated with the input audio.The synthesis will be based on the HMM-based audio-body motion correlation model derived from the multimodal analysis [13].

B. Markov Model
Fig. 1.A Markov chain with 5 state and with selected state transition Consider a system which may be described at any time as being in one of a set of N distinct states, , , , ..., ,as illustrated in Fig. 1.
At regularly spaced discrete times 1,2, …, the system undergoes a change of state (possibly back to the same state) according to a set of probabilities associated with the state.
We denote the actual state at time as .A full probabilistic description of the above system would, in general, require specification of the current state as well as all the predecessor state.For the special case of a discrete, first order, Markov chain, this probabilistic description is truncated to just the current and the predecessor state, i.e, | , , | … (1) Furthermore we only consider those process in which the righthand side of (1) is independent of time, thereby leading to the set of state transition probabilites .

C. Hidden Markov Model
Hidden markov model is Markov model with a case where the observation is a probabilistic function of the state.The resulting model (which is called hidden Markov model) is a doubly embedded stochastic process with an underlying stochastic process that is not observable (it is hidden), but can only be observed throuh another set of stochastic processes that produce the sequence of observations.[1] A formal characterization of HMM is as follows : • , , , … , N --A set of N states.The state at time is denoted by .

•
, , , --A set of distinct observation symbols.The observation at time is denoted by the variable .The observation symbols correspond to the physical output of the system being modeled.

•
--An matrix for the state transition probability distribution where is the probability of making a transition from state to : | , 1 , • --An matrix for the observation symbol probability distributions where is the • π π --The initial state distribution where π is the probability that the state S is the initial state : , Probabilistic notation , , and must satisfy stochastic constraints as follows : • ∑ a 1, i, and a 0.
An compact notation , , is used which includes only probabilistic parameters.
The left-right model as shown in Fig. 2. It is good for modelling order-constrained time-series whose properties sequentially change over time [12].Since the left-right model has no backward path, the state index either increases or stays the same as time increases.In other words, the state proceeds from left to right or stays where it was.On the other hand, every state in the ergodic or fully connected model can reach every other state in a single transition.The hierarchy of pose, gesture, phrase and dance is illustrated in Fig. 3.The whole dance is modelled as follows : , , , , , , , • --the finite nonempty set of hidden states.The states corresspond to gestures.Its segmentations are determined by the dance expert.
• --the finite nonempty set of input.

•
--the vocabulary of all possible discrete pose of dance.
-the finite nonempty set of output, where , , … , , , 1,2, … , . is the Klenee closure of , the set consisting of concatenations of arbitrarily many string of element from (pose).Output corresponds to gesture trajectories, or its features.

•
-state transition function .State transition corresponds to gesture transitions, which for and , , satisfies , , , and , , where is empty transition.

•
-the output map .
• --initial state, .Initial state corresponds to initial pose or initial gesture of all phrases of Likok Pulo.

•
-set of final (or accepting) states, .Final states correspond to the end of the phrase.
For Likok Pulo Dance as case study [15], the model for each phrase are illustrated in Fig. 4

B. Modelling The Dance Gesture in HMM
The left-right HMM model with two degrees is used because it is good for modelling order-constrained time-series whose properties sequentially change over time.As illustrated in Fig. 2, the hidden states , , … , correspond to the pose.The observations symbols , , correspond to physical output at the system, i.e., the discrete pose vector , (will be explained at chapter IV).Matrix corresponds to transition distribution between the gestures .Matrix corresponds to observation symbol probability distribution of discrete vector pose .Matrix , corresponds to initial gesture distribution.

IV. HUMAN POSE REPRESENTATION
The human pose representation must satisfy these objectives as follows [5]: (a) Robust coordinate system based on human body orientation, so that the skeleton representation does not depend to the position of the Kinect sensor.(b) Continuity and stability of the signal.(c) Reduce the dimension of the signal while maintaining the character of the motion.
There are several method to reduce the redundant dimension of the skeleton joint trajectories, such as

A. Torso PCA Frame
The joints of the human torso, defined by 7 red skeletal nodes in Fig. 11, rarely exhibit strong independent motion with large angle.Due to the strong noise in the depth sensing system, individual torso points, in particular shoulder and hips, may exhibit unrealistic motion that it would like to be limited.Therefore, the torso can be considered as a rigid body which provides 3D orthonormal basis will be used as reference frame for the remaining joints.
Its principal components as follows : • , the vector with the direction out of the upper to the lower.In most dancing, the player's torso will never stand upside-down relative to the sensor.
• , the vector with the direction out of the right body to the left side of the body.• , is the cross product of two principal components, .

B. First-Degree Joints
The first-degree joints of the human skeleton (right and left elbow, right and left knee, and head) defined by yellow skeletal nodes in Fig. 10.These joints are represented relative to the adjacent joint in the torso in a coordinate system derived from torso PCA frame as illustrated in Fig. 11 (a).The torso PCA frame is translated to (right shoulder) and construct spherical coordinate system such that the origin is , its azimuth axis is and its zenith axis is .Then (right elbow)'s position is described by

Fig. 2 .
Fig. 2. Graphical model of left-right discrete hidden Markov model

Fig. 3 .
Fig. 3. Hierarchy of the dance III.MODELLING THE WHOLE DANCE AND THE DANCE GESTURE A. Modelling The Whole Dance In this study, some terminologies are used as follows : • Pose -Static configuration of human body, without any movement.• Gesture -Dynamic movement of human body, which is sequence of poses.• Phrase -Fragment of choreography which consist of sequence of gestures.The same gestures may be repeated.• Dance -The whole choreography of a dance from the start to the end, which consist of sequence of phrases.

, Fig. 5 ,
Fig 6, Fig. 7, Fig 8, and Fig.9.Initial states are indicated by using bold circles.Final states are indicated by using double circles.Actually the Likok Pulo dance has 6-8 phrases.If the model are implemented in practical systems, input I can corresponds to music rhythm or pre-defined input.Output O can correspond to, e.g., indirect augmented reality during dance performance, or automatic generation of the dance for dancing humanoid robot.