Abstract
This paper discusses the validity of feature extraction method for speech recognition in articulatory domain. Firstly, a method is described to estimate articulatory movements from speech waves on the basis of a speech production model is described. Secondly, the validity of estimated articulatory parameters for speaker adaptation is tested. The results of experiments to recognize vowels of unspecified speakers show that the adaptation of the model by the estimated mean vocal tract length is effective to normalize the speaker difference. Thirdly, the effectiveness for continuous speech recognition is considered. Motor commands to move articulatory organs are estimated considering articulatory dynamics and the continuous vowels are recognized using the estimated commands. It is found that considerable part of coarticulation effects can be removed by the command estimation. Finally, some characteristics of phonemes are investigated in articulatory domain. It is found that the phonemic characteristics can be represented in particular parameter according to its articulatory manner.
T.Kobayashi is now with Hosei University.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
. H.Wakita, ‘Direct estimation of the vocal tract shape by inverse filtering of acoustic speech’,IEEE Trans. Audio & Electroacoustics, AU-21, No.5, 1973, pp.417–429.
. H.Wakita,Estimation of vocal-tract shapes from acoustical analysis of the speech wave: The state of the art’, IEEE Trans. Acoust., Speech &Signal Process., ASSP-27, No.3, 1979, pp.281–285.
. T.Nakajima, T.Ohmura, H.Tanaka and S.Ishizaki, Estimation of Vacal tract area functions by adaptive inverse filtering method’, Bul. Electrotech. Lab., 37, 1973, pp.462–481.
. H.W.Strube, ‘Can the area function of the human vocal tract be estimated from speech wave’, in Dynamic Aspects of Speech Production, Univ. Tokyo Press, Tokyo, 1977, pp.409–416.
. K.Shirai and M.Honda, ‘An articulatory model and the estimation of articulatory parameters by nonlinear regression model’, Trans. IECE Japan, J59-A, No.8, 1976, pp.668–674.
. K.Shirai and M.Honda, ‘ Estimation of articulatory parameter from speech wave’, Trans. IECE Japan, J61-A, No.5, 1978, pp.409–416.
. T.Kobayashi, J.Yazawa and K.Shirai, ‘Evaluation of spectral distance measure for the estimation of articulatory motion by the model matching method’, Trans. IECE Japan, J68-A, No.2, 1985, pp.210–217.
. H.Wakita, ‘Normalization of vowels by vocal-tract length and its application to vowel identification’, IEEE Trans. Acoust., Speech &Signal Process., ASSP-25, No.2, 1977, pp.183–192.
. K.Shirai, ‘Vowel identification in continuous speech using articulatory parameters’, IEEE Proc. ICASSP 81, Atlanta, USA, March 30 -April 1, 1981, pp.1172–1175.
. K.Shirai and T.Matsui, ‘Estimation of articulatory states from nasal sounds’ Trans. IECE Japan, J63-A, No.2, 1980, pp.75–81.
. K.Shirai and T.Kobayashi, ‘Recognition semivowels and consonants in continuous speech using articulatory parameters’, IEEE Proc. ICASSP 82, Paris, France, May 3–5, 1982, pp.2004–2007.
. K.Shirai, H.Matsuura and T.Kobayashi, ‘Validity of articulatory parameters in continuous speech recognition for unspecified speakers -Vowel recognition test -’, Trans. IECE Japan, J65-A, No.7, 1982, pp.671–678.
. S.Ishizaki, ‘Dynamic speech discrimination using an articulatory model’, Proc. IJCAI-79, Tokyo, Aug. 20–23, Japan, 1979, pp.422–424.
. S.Itahashi and S.Yokoyama, ‘Automatic formant trajectory tracking and its approximation by second order linear system’, J.Acoust.Soc. Japan, 29, No.11, 1973, pp.690–691.
. S.Itahashi and S.Yokoyama, ‘Description and segmentation of formant trajectory with second order linear system model’, Bul.Electrotech. Lab., 40, No.6, 1976.
. H.Fujisaki, M.Yoshida, Y.Sato and Y.Tanabe, ‘Automatic recognition of connected vowels using a functional model of the coarticulatory process’, J.Acoust.Soc.Japan, 29, No.10, 1973, pp.636–638.
. Y.Sato and H.Fujisaki, ‘Formulation of the process of coarticulation in terms of formant frequencies and its application to automa-tic speech recognition’, J.Acoust.Soc.Japan, 34, No.3,1978, pp.177–185.
. K.Shirai and T.Kobayashi, ‘Consideration on articulatory dynamics for continuous speech recognition’, IEEE Proc. ICASSP 83, 7.10, Boston, U.S.A., April 14–16, 1983, pp.324–327.
. K.Shirai and T.Kobayashi, ‘Estimating articulatory motion from speech wave’, Speech Communication, 5, No.2, 1986, pp.159–170.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1988 Kluwer Academic Publishers
About this chapter
Cite this chapter
Shirai, K., Kobayashi, T. (1988). Speech Production Model and Automatic Recognition. In: Carvallo, M.E. (eds) Nature, Cognition and System I. Theory and Decision Library, vol 2. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-2991-3_1
Download citation
DOI: https://doi.org/10.1007/978-94-009-2991-3_1
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-7844-3
Online ISBN: 978-94-009-2991-3
eBook Packages: Springer Book Archive