ISCA Archive Interspeech 2011
ISCA Archive Interspeech 2011

Improvements in speaker characterization using spectral subband energy based on harmonic plus noise model

Yanhua Long, Zhi-Jie Yan, Frank K. Soong, Li-Rong Dai, Wu Guo

We previously proposed the use of Spectral Subband Energy Ratio (SSER) as speaker features in a speaker verification system [1]. Those SSER features were derived from two distinct componentsthe harmonic and noise speech parts, which were decomposed by the Harmonic plus Noise Model(HNM) from the original speech. In this paper, we report several recent improvements to this approach. First, we go into the details of the two distinct speech components and achieve a surprising better performance by only extracting the separate Spectral Subband Energy features from each component. Second, we propose a soft unvoiced/voiced (U/V) decision method to preserve more speech data during HNM analysis and feature extraction. Greatly improved experiment results have shown the efficiency of this soft U/V decision. Finally, a further preliminary attempt to extract features from linear frequency domain to mel-frequency domain has also been examined.

Long, Y., Yan, Z-J., Soong, F. K., Dai, L. and Guo, W., “Speaker Characterization Using Spectral Subband Energy Ratio Based on Harmonic Plus Noise Model”, in Proc. ICASSP, 2011


doi: 10.21437/Interspeech.2011-133

Cite as: Long, Y., Yan, Z.-J., Soong, F.K., Dai, L.-R., Guo, W. (2011) Improvements in speaker characterization using spectral subband energy based on harmonic plus noise model. Proc. Interspeech 2011, 373-376, doi: 10.21437/Interspeech.2011-133

@inproceedings{long11_interspeech,
  author={Yanhua Long and Zhi-Jie Yan and Frank K. Soong and Li-Rong Dai and Wu Guo},
  title={{Improvements in speaker characterization using spectral subband energy based on harmonic plus noise model}},
  year=2011,
  booktitle={Proc. Interspeech 2011},
  pages={373--376},
  doi={10.21437/Interspeech.2011-133}
}