Measuring intelligibility in spontaneous speech using syllables perceived as understood

https://doi.org/10.1016/j.jcomdis.2021.106108Get rights and content

Highlights

  • Syllables perceived as Understood- SPU, is a new method to assess intelligibility.

  • SPU can be applied to spontaneous speech without requiring a master transcript.

  • SPU shows promising results concerning validity and reliability.

  • SPU is a time- and labour-efficient method to assess intelligibility in research.

  • SPU is a useful alternative to informal estimates in clinical assessment.

Abstract

Purpose

Intelligibility, the ability to convey a message by speech, is one of the most important variables in speech-language pathology. The assessment of intelligibility is a challenge especially when it comes to spontaneous speech. The aim of the study was to investigate validity and reliability of a method for assessment of intelligibility, syllables perceived as understood (SPU); a method that is more time-efficient than previous methods based on transcription, as it does not require a master transcript for reference.

Method

A group of 20 adult listeners transcribed stimuli consisting of spontaneous speech from 16 children (14 with speech sound disorder and two with typical speech and language development, age 4:4 to 8:1, M = 6:0). Intelligibility was calculated based on these orthographic transcripts, as a) proportion of syllables perceived as understood (SPU) and b) proportion of syllables correctly understood (SCU), with reference to a master transcript. Validity was checked through investigation of the correlation and difference between these two measures. Reliability was analysed with inter-listener reliability by intra-class correlation.

Results

The correlation between SPU and SCU (the gold standard intelligibility score) was strong and statistically significant, with SPU being consistently higher than SCU. Inter-listener reliability for single measures of intra-class correlation of the assessment by syllables perceived as understood was moderate to low, whereas the inter-listener reliability for average measures of intra-class correlation was high.

Conclusions

The method based on SPU might be used for assessment of intelligibility if the median from several listeners is used or when comparing results from the same listener over time. The SPU method might therefore be a valuable tool in a clinical and research context as a more valid option than rating scales and a more time-efficient method than the gold standard SCU method. However, it should be noted that the reliability of the SPU is not as high as for the SCU.

Introduction

Successful communication between human beings depends on an interplay between the speaker's intention and the listener's interpretation of the spoken utterance. This interplay is challenged when the spoken signal is impaired, for example, because of dysarthria in adults or speech sound disorders in children (Hodge & Gotzke, 2007; Hustad, Oakes & Allison, 2015; McLeod, Crowe & Shahaeian, 2015). It is therefore important that speech assessment and intervention include this particular aspect, namely intelligibility. As has been stated by Miller (2013), helping patients to obtain intelligible speech and training listeners to understand that speech are central aims of speech language therapy.

An often used definition of intelligibility that focuses on the speech signal is “the degree to which the acoustic signal […] is understood by a listener” (Yorkston, Strand & Kennedy, 1996, p. 55). This is applicable when focusing on improving the speech signal such as in articulatory training or when evaluating cleft palate speech after reconstructive surgery. However, communication includes much more than just the acoustic signal, for example gestures, miming and other nonverbal cues, which can in fact compensate for an impaired speech signal (Miller, 2013). When dealing with these factors, intelligibility can be defined as “the degree to which the speaker's intended message is recovered by the listener” (Kent, Weismer, Kent & Rosenbek, 1989, p. 483), hence not focusing only on the speech signal. A factor that influences listeners’ chances of recovering the speaker's intended message is the amount of contextual information available, for example, if the listener knows the overall topic or the situation in which the utterance is produced. The term comprehensibility has been used to capture these dimensions and is defined as “the extent to which a listener understands utterances produced by a speaker in a communication context” (Barefoot, Bochner, Johnson & vom Eigen, 1993, p.32). Comprehensibility therefore includes signal-independent information such as syntax, semantics and contextual information (Yorkston et al., 1996). Further, factors tied to the listener also contribute to intelligibility, such as familiarity with disordered speech or other specific speech traits (Flipsen, 1995; Hustad & Cahill, 2003; Hustad, Dardis & Kramper, 2011; McHenry, 2011; Pennington & Miller, 2007). Studies have also found that intelligibility is influenced by linguistic proficiency (Lagerberg, Lam, Olsson, Abelin & Strömbergsson, 2019), utterance length and complexity (Allison & Hustad, 2014; Barreto & Ortiz, 2020) as well as familiarity with the speaker (Hustad & Cahill, 2003; Tjaden & Liss, 1995). Moreover, we also need to consider the speech material when assessing intelligibility. Although single words and shorter utterances may be less demanding of the articulatory system, and hence, cause less disturbance on the speech signal, such materials provide less contextual information for listeners to aid their comprehension (Gordon-Brannan & Hodson, 2000; Hustad, 2007; Johannisson, Lohmander & Persson, 2014; Lillvik, Allemark, Karlström & Hartelius, 1999). To sum up, many different perspectives should be taken into account when assessing intelligibility.

When assessing intelligibility, listener tasks vary, such as rating scales, multiple choice forms and transcription. The selection of listener task often involves a tradeoff between practical and methodological concerns. For example, rating scales are easy to handle and time-efficient but have low reliability and validity (Kent, 1996, 1994; Schiavetti, 1992; Whitehill, 2002). The issue of using rating scales in the assessment of intelligibility has been discussed for at least the last four decades (Miller, 2013; Yorkston & Beukelman, 1978). The problem of reliability and validity resides in that the variable intelligibility is prothetic, (i.e., a continuum that adds or subtracts from the previous level, e.g., loudness) rather than metathetic (i.e., a continuum that changes in quality, e.g., pitch). A prothetic variable should not be assessed using equal appearing scales since listeners are not able to partition these type of variables into equal intervals (Schiavetti, 1992; Whitehill, 2002). Schiavetti (1992) further states that since it is possible to obtain a measure on a ratio level, which is higher than the scaling level (e.g. rating scales), this is what should be used. Visual analogue scales and estimates of percent understood are scaling measures where analysis at ratio level is utilised. These methods have been applied in several studies when measuring intelligibility. For example, Hustad (2006) used percent estimates of intelligibility in dysarthric speech secondary to cerebral palsy. A further example is Tjaden, Sussman and Wilding (2014) who applied visual analogue scales when assessing intelligibility in speech from people with multiple sclerosis and Parkinson's disease. The authors involved in these studies discuss the issues of validity of the scaling methods applied, including that the listeners seem to have problems separating intelligibility from other speech characteristics (Tjaden et al., 2014) and Hustad concludes that as a clinical measure orthographic transcription might be more consistent than percent estimates (Hustad, 2006). Hence, one problem with rating scales is the difficulties that the listeners seem to have in separating intelligibility from other speech variables (Miller;, 2013; Whitehill, 2002). Although important variables at activity and participation level, like comprehensibility (Yorkston et al., 1996), acceptability (Strömbergsson, Edlund, McAllister & Lagerberg, 2020) and ease of listening (EOL) (Landa et al., 2014) are closely related to intelligibility, they are important to keep apart from each other. One way to address validity and reliability is to measure intelligibility as the proportion of words or syllables correctly understood by the listener (Hustad et al., 2015; Kent et al., 1994; Klein & Flint, 2006; Miller, 2013), often based on pre-determined lists of single words or a specific text that is read aloud or repeated after a model by the speaker. However, although these methods give the advantage of knowing the target word, they do not necessarily represent the speaker's ability to communicate efficiently in daily situations. For this reason, continuous speech produced spontaneously would be preferred. However, if the speech is severely impaired, it might not be possible to know the intended target words (Kwiatkowski & Shriberg, 1992). To address this problem when measuring intelligibility in spontaneous speech, a method using a master transcript has been developed; this method can be considered a gold standard (Gordon-Brannan & Hodson, 2000; Kwiatkowski & Shriberg, 1992). A master transcript is produced by the researcher who made the recording, together with the caregivers, as soon as possible after the recording session, in order to capture as much as possible of what was said. Transcripts produced by other listeners can then be compared to this master transcript, allowing calculation of the percentage of words or syllables correctly understood (Kwiatkowski & Shriberg, 1992). The drawback of this method is that it is very time consuming and labour-intensive to produce the master transcript, and – even after consulting with caregivers – the speaker's intended message may still not be clear (Lagerberg, Åsberg, Hartelius & Persson, 2014). Hence, a method that is easier to use would be of value, for clinical use and in research, provided that it is reliable and valid.

An additional challenge when dealing with unintelligible speech is that it might be impossible to identify word boundaries and, hence, to know how many words an utterance contains (Flipsen, 2006). Consequently, the percentage of words correctly understood might be misleading. To address this challenge, Flipsen (2006) has suggested to estimate the number of words from the number of syllables, using a syllable per word index. Another possibility to avoid this problem might be to count syllables instead of words in both intelligible and unintelligible parts of an utterance (Lagerberg et al., 2014). Throughout this manuscript, we will refer to this measure as syllables correctly understood (SCU).

The Weiss intelligibility test (Weiss,1982) includes assessment of intelligibility in isolated words and spontaneous speech. By adding the scores from these two speech tasks and dividing the sum by two, an overall intelligibility score is obtained. The calculation of intelligibility in spontaneous speech in this particular test is made without using a master transcript. The child talks about one of six pictures that he/she choses, about 200 words of spontaneous speech is audio recorded. The clinician is instructed to give as little verbal input as possible. The listener uses a grid with each box representing a word and marks with “√” for each word that was understood and leaves the box empty for words that are not understood. The listener can be “the clinician or any other listener” (p. 9) and guessing is not allowed. The scoring procedure is made directly from listening to the audio recording or by first making an orthographic transcription. The overall intelligibility score is the mean of the score from the spontaneous speech task (percentage of words understood) and percentage of single words understood using a picture naming task (Weiss, 1982). This method is also discussed in Kent et al. (1994), however the investigation of inter- and intra-listener reliability that Weiss made was not that comprehensive. By starting from this method, Lagerberg et al. (2014) suggested an intelligibility score based on the amount of syllables perceived as understood by the listener. This method has been evaluated regarding validity against the percentage of consonants correct (PCC) and found to have high validity and reliability in terms of inter-listener reliability (ICC single measures = 0.71 and ICC average measures = 0.91) and intra-listener reliability (r = 0.94, p < .01) (Lagerberg et al., 2014). The study included speech samples from 10 children with speech sound disorders (SSD) and 10 children with typical development of speech and language. The 20 listeners were mostly SLP students, but two were recent graduates. In this validation, the PCC scores were based on speakers’ articulation of single words, and the intelligibility scores based on the speakers’ continuous speech samples. Hence, it would be desirable to re-assess the validation in two respects: a) by relating the suggested intelligibility score to the ‘gold standard’ measure of intelligibility that is, the proportion of syllables correctly understood (SCU), and b) by relating the suggested intelligibility score to a measure of speech accuracy in the same speech sample.

In the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders, DSM-5 (American Psychiatric Association, 2013), speech sound disorders (SSDs) are described as conditions where speech is impaired to the degree that intelligibility is reduced. As such, reduced intelligibility is a functional consequence of impaired speech in SSD. Using the terminology of the International Classification of Functional Disability and Health, ICF-CY (WHO 2007) the SSD, by reducing the intelligibility, causes a limitation in activity (McCormack, McLeod, McAllister & Harrison, 2010; Pennington et al., 2013). Children with SSD have also been found to have limitations in participation and can be subject to negative attitudes from the environment and communication partners (McCormack et al., 2010.) This further clarifies the value of addressing intelligibility both in assessment and intervention for children with SSD as has been highlighted in several studies over recent years (Hustad et al., 2015; Landa et al., 2014; Lousada, Jesus, Hall & Joffe, 2014).

Validity can be investigated by comparing results from the new assessment method with results from an already well-established assessment method, that is, congruent validity (Steiner & Norman, 2008). Many studies have validated measures of intelligibility with reference to articulation measures (McLeod et al., 2015; Morris, Wilcox & Schooling, 1995; Zajac, Plante, Lloyd & Haley, 2011), or have been based on comparisons between different speech tasks, for example, compared results from single words with results from spontaneous speech (Hodge & Gotzke, 2007; Lagerberg et al., 2014). In the present study, we aim for the closest possible match to the method we seek to investigate and therefore use spontaneous speech as the common basis. Furthermore, we used a method that is considered the gold standard for assessing intelligibility. A second type of validity is convergent validity where results are compared to measurements of a variable that are related to the target variable (Streiner & Norman, 2008).

To conclude, intelligibility is central to the care of children with SSDs, and should be assessed regularly and reliably both in a clinical context and in research. This calls for reliable and valid assessment methods that require a reasonable time and effort. Therefore, the aim of the present study is to evaluate the validity and reliability of the method proposed by Lagerberg et al. (2014), based on the amount of syllables perceived as understood (SPU).

The research questions are:

  • 1

    Does the SPU (syllables perceived as understood) method exhibit congruent validity, as measured against the gold standard measure of intelligibility that is, the proportion of syllables correctly understood (SCU)?

  • 2

    Does the SPU method exhibit convergent validity, as measured against speech accuracy in the same speech sample?

  • 3

    Is the SPU-based intelligibility measure reliable when it comes to inter-listener reliability?

Section snippets

Method

The project is part of a larger project, "Real-time assessments of intelligibility in children's connected speech", which has been approved by the Regional Ethical Review Board in Stockholm (No. 2016/1628–31/1).

Congruent validity of assessment by syllables perceived as understood (SPU)

The correlation between the SPU-based intelligibility score and the gold standard SCU-based intelligibility score was strong and statistically significant, r = 0.84, p < .001, as calculated by a Spearman correlation, confirming the congruent validity of the SPU-based intelligibility score.

Despite the strong correlation between the SPU-based intelligibility score and the SCU-based intelligibility score, there was a significant difference between the two (asymptotic Z = −3.517, p < .001), such

Discussion

The aim of the present study was to investigate a method for assessment of intelligibility that is relatively time- and labour-efficient while at the same time meets requirements of reliability and validity, for use both in clinical work and in research. The proposed method of Syllables Perceived as Understood (SPU), where the creation of a master transcript is not necessary, proved to have high validity in terms of construct validity – and reliability was poor to excellent as analysed by

CRediT authorship contribution statement

Tove B. Lagerberg: Conceptualization, Methodology, Writing - original draft, Data curation, Formal analysis, Writing - review & editing. Katarina Holm: Investigation, Writing - review & editing. Anita McAllister: Writing - review & editing. Sofia Strömbergsson: Conceptualization, Methodology, Data curation, Formal analysis, Supervision, Funding acquisition, Writing - review & editing.

Declaration of Competing Interest

The authors report no declarations of interest.

Acknowledgements

The authors wish to thank Emil Brynte, MSc, SLP for valuable contributions to the data collection.

References (51)

  • P. Flipsen

    Speaker–listener familiarity: Parents as listeners of delayed speech intelligibility

    Journal of Communication Disorders

    (1995)
  • T.K. Koo et al.

    A guideline of selecting and reporting Intraclass Correlation Coefficients for reliability research

    Journal of Chiropractic Medicine

    (2016)
  • K.M. Yorkston et al.

    A comparison of techniques for measuring intelligibility of dysarthric speech

    Journal of communication disorders

    (1978)
  • K.M. Allison et al.

    Impact of sentence length and phonetic complexity on intelligibility of 5-year-old children with cerebral palsy

    International Journal of Speech-Language Pathology

    (2014)
  • Diagnostic and statistical manual of mental disorders

    (2013)
  • S.M. Barefoot et al.

    Rating deaf speakers’ comprehensibility: An exploratory investigation

    American Journal of Speech-Language Pathology

    (1993)
  • S.D.S. Barreto et al.

    Speech intelligibility in dysarthrias: Influence of utterance length

    Folia Phoniatrica et Logopaedica

    (2020)
  • P. Flipsen

    Measuring the intelligibility of conversational speech in children

    Clinical Linguistics & Phonetics

    (2006)
  • M. Gordon-Brannan et al.

    Intelligibility/severity measurements of prekindergarten children's speech

    American Journal of Speech-Language Pathology

    (2000)
  • N. Gurevich et al.

    Speech-Language Pathologists’ use of intelligibility measures in adults with dysarthria

    American Journal of Speech-Language Pathology

    (2017)
  • M. Hodge et al.

    Preliminary results of an intelligibility measure for English-speaking children with cleft palate

    Cleft Palate Craniofacial Journal

    (2007)
  • K.C. Hustad

    Estimating the intelligibility of speakers with dysarthria

    Folia Phoniatrica et Logopaedica

    (2006)
  • K.C. Hustad

    Effects of speech stimuli and dysarthria severity on intelligibility scores and listener confidence ratings for speakers with cerebral palsy

    Folia Phoniatrica et Logopaedica

    (2007)
  • K.C. Hustad et al.

    Effects of presentation mode and repeated familiarization on intelligibility of dysarthric speech

    American Journal of Speech- Language Pathology

    (2003)
  • K.C. Hustad et al.

    Use of listening strategies for the speech of individuals with dysarthria and cerebral palsy

    Augmentative and Alternative Communication

    (2011)
  • K.C. Hustad et al.

    Variability and diagnostic accuracy of speech intelligibility scores in children

    Journal of Speech Language and Hearing Research

    (2015)
  • T.B. Johannisson et al.

    Assessing intelligibility by single words, sentences and spontaneous speech: A methodological study of the speech production of 10-year olds

    Logopedics, Phoniatrics, Vocology

    (2014)
  • R.D. Kent

    Hearing and believing: Some limits to the auditory–perceptual assessment of speech and voice disorders

    American Journal of Speech-Language Pathology

    (1996)
  • R.D. Kent et al.

    The intelligibility of children's speech: A review of evaluation procedures

    American Journal of Speech- Language Patholology

    (1994)
  • R.D. Kent et al.

    Toward phonetic intelligibility testing in dysarthria

    Journal of Speech and Hearing Disorders

    (1989)
  • E.S. Klein et al.

    Measurement of intelligibility in disordered speech

    Language Speech and Hearing Services in Schools

    (2006)
  • J. Kwiatkowski et al.

    Intelligibility assessment in developmental phonological disorders: Accuracy of caregiver gloss

    Journal of Speech and Hearing Research

    (1992)
  • T.B. Lagerberg et al.

    Swedish Test of Intelligibility for Children (STI-CH)—Validity and reliability of a computer-mediated single-word intelligibility test for children

    Clinical Linguistics and Phonetics

    (2015)
  • T.B. Lagerberg et al.

    Effect of number of repetitions on listener transcriptions in assessment of speech intelligibility in children

    International Journal of Language & Communication Disorders

    (2015)
  • T.B. Lagerberg et al.

    Assessment of intelligibility using children's spontaneous speech: Methodological aspects

    International Journal of Language & Communication Disorders

    (2014)
  • Cited by (3)

    • Peer attitudes towards adolescents with speech disorders due to cleft lip and palate

      2023, International Journal of Pediatric Otorhinolaryngology
    • Understanding acceptability of disordered speech through Audience Response Systems-based evaluation

      2021, Speech Communication
      Citation Excerpt :

      Following the procedure described in Gordon-Brannan and Hodson (2000), the ‘gold standard’ intelligibility score (INTGS) for each sample was derived as the median proportion of correctly transcribed syllables for this sample, evaluated against a ‘master transcript’. As presented in Lagerberg et al. (2021) and Strömbergsson (2020), the reliability of the INTGS score – as measured by inter- and intra-judge agreement in the counting of correct syllables in orthographic listener transcriptions – was excellent (intra-judge agreement: .998; inter-judge agreement: .995). In both rating-based and ARS-based assessments, four of the 16 samples were duplicated, to form a 20-item listening script, compiled in pseudo-random order, with duplicated samples never immediately following the first playback of a sample.

    • Can We Use Speaker Embeddings On Spontaneous Speech Obtained From Medical Conversations To Predict Intelligibility?

      2023, 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
    View full text