Skip to main content
Log in

Speech perception as a multilevel processing system

  • Published:
Journal of Psycholinguistic Research Aims and scope Submit manuscript

Abstract

This is the report of an experiment that emphasized how variability in voice characteristics will affect a listener's ability to process sentences. Consideration is also given to the description of a possible model for the processing of continuous speech. As part of the experiment, use was made of a number of speakers, each of whom recorded the same set of sentences. The recorded sentences were used in order to form a pair of experimental tapes. On one tape, sentences were arranged randomly such that the listener could predict neither the content of a sentence nor the voice of the speaker who had produced it. On a second tape, sentences were arranged in blocks by speakers such that there was uncertainty as to content but not with respect to voice. Four groups of subjects listened to these tapes, with each tape being processed under a low and high condition of background noise. In interpreting the data, emphasis was placed on the idea that the speech signal contains a variety of characteristics or features and that these features are processed by three interacting subsystems. One is a prosodic system, responsible for the segmentation of continuous speech into sentences, phrases, and words. It attempts to establish a context within which a second system, responsible for the processing of words and syllables, can operate. However, this pair of systems is speaker-dependent in that it makes use of features that need to be adjusted on the basis of an assessment of voice characteristics. Thus, the model also provides for the inclusion of a third system, responsible for the assessment of voice characteristics. In effect, this third subsystem is responsible for “training” the two primary systems in order to normalize for differences in individual voice characteristics. We had predicted that performance would be uniformly more accurate in the blocked condition than in the unblocked condition since one type of uncertainty would have been eliminated. The results confirm this hypothesis only in part since there is an interaction with variations in backgroud noise. However, a plausible explanation is offered for these findings and for a variety of related results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adams, M. J. Failures to comprehend contents of processing in reading. In R. J. Spiro, B. C. Bruce, & W. F. Brewer (Eds.),Theoretical issues in reading comprehension. Hillsdale, New Jersey: Lawrence Erlbaum, 1980.

    Google Scholar 

  • Broadbent, D. E.Decision and stress. London: Academic Press, 1971.

    Google Scholar 

  • Bruner, J. S. & Potter, M. C. Interference in visual recognition.Science, 1964,144, 424–425.

    Google Scholar 

  • Cannon, M. W. A method of analysis and recognition for voiced vowels.IEEE Transactions on Audio and Electroacoustics, 1968,AU-16, 154–158.

    Google Scholar 

  • Cole, R. A. & Scott, B. Toward a theory of speech perception.Psychological Review, 1974,81, 348–374.

    Google Scholar 

  • Crystal, D.Prosodic systems and intonation in English. Cambridge, England: University Press, 1969.

    Google Scholar 

  • Crystal, D. & Quirk, R.Systems of prosodic and paralinguistic features in English. The Hague: Mouton, 1964.

    Google Scholar 

  • Davis, K. H., Biddulph, R., & Balashek, S. Automatic recognition of spoken digits.Journal of the Acoustical Society of America, 1952,24, 637–642.

    Google Scholar 

  • Firth, J. R.Progress in linguistics, 1934–1951. London: Oxford University press, 1957.

    Google Scholar 

  • Fry, D. B., & Denes, P. Experiments in mechanical speech recognition. In C. Cherry (Ed.),Information theory. London: Butterworths, 1956. Pp. 206–212.

    Google Scholar 

  • Gold, B. Machine recognition of hand-sent Morse code.IRE Transactions on Information Theory, 1959,IT-5, 17–24.

    Google Scholar 

  • Green, D. M., & Swets, J. A.Signal detection theory and psychophysics. New York: Wiley, 1966.

    Google Scholar 

  • Haberman, S. J.Analysis of qualitiative data (Vol. 1): Introductory topics. New York: Academic Press, 1978.

    Google Scholar 

  • Hill, D. R. Automatic speech recognition: A problem in machine intelligence. In M. L. Collins, and D. Michie (Eds.),Machine intelligence I. Edinburgh: Oliver and Boyd, 1967.

    Google Scholar 

  • Hill, D. R. Man's machine interaction using speech. In F. L. Alt & M. Rubinoff (Eds.),Advances in computers (Vol. 11). New York: Academic Press, 1971. Pp. 166–230.

    Google Scholar 

  • Huggins, A. W. F. On the perception of the temporal phenomena in speech.Journal of the Acoustical Society of America, 1972,51, 1279–1290.

    Google Scholar 

  • Huggins, A. W. F. Speech timing and intelligibility. In J. Requin (Ed.),Attention and performance, (Vol. 7). Hillsdale, New Jersey: Erlbaum, 1978, Pp. 279–297.

    Google Scholar 

  • Hyde, S. R. Automatic speech recognition: Critical survey of the literature. In E. E. David, Jr., & P. B. Denes (Eds.),Human communication: A unified view. New York: McGraw-Hill, 1972, Pp. 399–438.

    Google Scholar 

  • Jakobson, R., Fant, G. M., & Halle, M.Preliminaries to speech analysis: The distinctive features and their correlates. Cambridge, Acoustics Laboratory: M.I.T. Technical Report No. 13, 1952.

    Google Scholar 

  • Krulee, G. K., & Schwartz, H. R. Scanning processes and sentence recognition.Journal of Psycholinguistic Research, 1975,4 141–158.

    Google Scholar 

  • Ladefoged, P., & Broadbent, D. B. Information conveyed by vowels.Journal of the Acoustical Society of America, 1957,29, 98–104.

    Google Scholar 

  • Levinson, S. E., & Liberman, M. Y. Speech recognition in computers.Scientific American, 1981,244, 64–76.

    Google Scholar 

  • Lindgren, N. Machine recognition of human language.IEEE Spectrum, 1965,2, (3 & 4) 114–136, 44–59, 104–116.

    Google Scholar 

  • Malmberg, B.Phonetics. New York: Dover, 1963.

    Google Scholar 

  • Newell, A., Barnett, J., Forgie, J. W., Green, C., Klatt, D. H., Licklider, J. C. R., Munson, J., Reddy, D. R., & Woods, W. A..Speech understanding systems. Amsterdam: North-Holland, 1973.

    Google Scholar 

  • Otten, K. W. Approaches to the machine recognition of conversational speech. In F. L. Alt & M. Rubinoff (Eds.),Advances in computers (Vol. 11). New York: Academic Press, 1971. Pp. 127–163.

    Google Scholar 

  • Pierce, J. R. Whither speech recognition.Journal of the Acoustical Society of America, 1969,46, 1049–1051.

    Google Scholar 

  • Pisoni, D. B. Some current theoretical issues in speech perception.Cognition, 1981,10, 249–259.

    Google Scholar 

  • Pollack, I., & Pickett, J. M. Intelligibility of exerpts from fluent speech: Auditory vs. structural context.Journal of Verbal Learning and Verbal Behavior, 1964,3 79–84.

    Google Scholar 

  • Potter, M. C.. On perceptual recognition. In J. R. Bruner, R. R. Olver, & P. M. Greenfield (Eds.),Studies in cognitive growth. New York: Wiley, 1967.

    Google Scholar 

  • Reddy, D. R. Computer recognition of connected speech.Journal of the Acoustical Society of America, 1967,42, 329–347.

    Google Scholar 

  • Reddy, D. R. Speech recognition by machine: A review.Proceedings IEEE, 1976,64, 501–531.

    Google Scholar 

  • Shearne, J. N., & Leach, P. F. Some experiments with a single word recognition system.IEEE Transactions on Audio and Electroacoustics. 1968,AU-16, 256–261.

    Google Scholar 

  • Stevens, K. N., & House, A. S. Speech perception. In J. V. Tobias (Ed.),Foundation of modern auditory theory, (Vol. 2) New York: Academic Press, 1972.

    Google Scholar 

  • Thorndike, E. L.The teacher's word book. New York: Teacher's College, Columbia University, 1921.

    Google Scholar 

  • Wingfield, A., & Klein, J. F. Syntactic structure and acoustic pattern in speech perception.Perception and Psychophysics, 1971,9, 23–25.

    Google Scholar 

  • Wirrin, J., & Stubbs, H. L. Electronic binary selections system for phoneme classifications.Journal of the Accoustical Society of America, 1956,28, 1082–1091.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krulee, G.K., Tondo, D.K. & Wightman, F.L. Speech perception as a multilevel processing system. J Psycholinguist Res 12, 531–554 (1983). https://doi.org/10.1007/BF01067960

Download citation

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01067960

Keywords

Navigation