ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Estimation of talker's head orientation based on discrimination of the shape of cross-power spectrum phase coefficients

Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki

This paper presents a talker's head orientation estimation method using 2-channel microphones. In recent research, some approaches based on a network of microphone arrays have been proposed in order to estimate the talker's head orientation. In those methods, the talker's head orientation is estimated using the sound amplitude or peak value of CSP (Cross-power Spectrum Phase) coefficients obtained from each microphone array. However, microphone array network systems need many microphone arrays to be set along the walls of a given room so that sub-microphone arrays surround the user. In this paper, we focus on the shape of the CSP coefficients affected by the reverberation, which depends on the talker's position and the head orientation. In our proposed method, we use not only the peak value but also the other values of the CSP coefficients as feature vectors, and the talker's position and the head orientation are estimated by discriminating the CSP vector. The effectiveness of this method has been confirmed by talker localization and head orientation estimation experiments performed in a real environment.

Index Terms: microphone array, talker localization, head orientation estimation, acoustic transfer function, CSP coefficients


doi: 10.21437/Interspeech.2012-403

Cite as: Takashima, R., Takiguchi, T., Ariki, Y. (2012) Estimation of talker's head orientation based on discrimination of the shape of cross-power spectrum phase coefficients. Proc. Interspeech 2012, 1844-1847, doi: 10.21437/Interspeech.2012-403

@inproceedings{takashima12_interspeech,
  author={Ryoichi Takashima and Tetsuya Takiguchi and Yasuo Ariki},
  title={{Estimation of talker's head orientation based on discrimination of the shape of cross-power spectrum phase coefficients}},
  year=2012,
  booktitle={Proc. Interspeech 2012},
  pages={1844--1847},
  doi={10.21437/Interspeech.2012-403}
}