Measuring intelligibility in spontaneous speech using syllables perceived as understood
Introduction
Successful communication between human beings depends on an interplay between the speaker's intention and the listener's interpretation of the spoken utterance. This interplay is challenged when the spoken signal is impaired, for example, because of dysarthria in adults or speech sound disorders in children (Hodge & Gotzke, 2007; Hustad, Oakes & Allison, 2015; McLeod, Crowe & Shahaeian, 2015). It is therefore important that speech assessment and intervention include this particular aspect, namely intelligibility. As has been stated by Miller (2013), helping patients to obtain intelligible speech and training listeners to understand that speech are central aims of speech language therapy.
An often used definition of intelligibility that focuses on the speech signal is “the degree to which the acoustic signal […] is understood by a listener” (Yorkston, Strand & Kennedy, 1996, p. 55). This is applicable when focusing on improving the speech signal such as in articulatory training or when evaluating cleft palate speech after reconstructive surgery. However, communication includes much more than just the acoustic signal, for example gestures, miming and other nonverbal cues, which can in fact compensate for an impaired speech signal (Miller, 2013). When dealing with these factors, intelligibility can be defined as “the degree to which the speaker's intended message is recovered by the listener” (Kent, Weismer, Kent & Rosenbek, 1989, p. 483), hence not focusing only on the speech signal. A factor that influences listeners’ chances of recovering the speaker's intended message is the amount of contextual information available, for example, if the listener knows the overall topic or the situation in which the utterance is produced. The term comprehensibility has been used to capture these dimensions and is defined as “the extent to which a listener understands utterances produced by a speaker in a communication context” (Barefoot, Bochner, Johnson & vom Eigen, 1993, p.32). Comprehensibility therefore includes signal-independent information such as syntax, semantics and contextual information (Yorkston et al., 1996). Further, factors tied to the listener also contribute to intelligibility, such as familiarity with disordered speech or other specific speech traits (Flipsen, 1995; Hustad & Cahill, 2003; Hustad, Dardis & Kramper, 2011; McHenry, 2011; Pennington & Miller, 2007). Studies have also found that intelligibility is influenced by linguistic proficiency (Lagerberg, Lam, Olsson, Abelin & Strömbergsson, 2019), utterance length and complexity (Allison & Hustad, 2014; Barreto & Ortiz, 2020) as well as familiarity with the speaker (Hustad & Cahill, 2003; Tjaden & Liss, 1995). Moreover, we also need to consider the speech material when assessing intelligibility. Although single words and shorter utterances may be less demanding of the articulatory system, and hence, cause less disturbance on the speech signal, such materials provide less contextual information for listeners to aid their comprehension (Gordon-Brannan & Hodson, 2000; Hustad, 2007; Johannisson, Lohmander & Persson, 2014; Lillvik, Allemark, Karlström & Hartelius, 1999). To sum up, many different perspectives should be taken into account when assessing intelligibility.
When assessing intelligibility, listener tasks vary, such as rating scales, multiple choice forms and transcription. The selection of listener task often involves a tradeoff between practical and methodological concerns. For example, rating scales are easy to handle and time-efficient but have low reliability and validity (Kent, 1996, 1994; Schiavetti, 1992; Whitehill, 2002). The issue of using rating scales in the assessment of intelligibility has been discussed for at least the last four decades (Miller, 2013; Yorkston & Beukelman, 1978). The problem of reliability and validity resides in that the variable intelligibility is prothetic, (i.e., a continuum that adds or subtracts from the previous level, e.g., loudness) rather than metathetic (i.e., a continuum that changes in quality, e.g., pitch). A prothetic variable should not be assessed using equal appearing scales since listeners are not able to partition these type of variables into equal intervals (Schiavetti, 1992; Whitehill, 2002). Schiavetti (1992) further states that since it is possible to obtain a measure on a ratio level, which is higher than the scaling level (e.g. rating scales), this is what should be used. Visual analogue scales and estimates of percent understood are scaling measures where analysis at ratio level is utilised. These methods have been applied in several studies when measuring intelligibility. For example, Hustad (2006) used percent estimates of intelligibility in dysarthric speech secondary to cerebral palsy. A further example is Tjaden, Sussman and Wilding (2014) who applied visual analogue scales when assessing intelligibility in speech from people with multiple sclerosis and Parkinson's disease. The authors involved in these studies discuss the issues of validity of the scaling methods applied, including that the listeners seem to have problems separating intelligibility from other speech characteristics (Tjaden et al., 2014) and Hustad concludes that as a clinical measure orthographic transcription might be more consistent than percent estimates (Hustad, 2006). Hence, one problem with rating scales is the difficulties that the listeners seem to have in separating intelligibility from other speech variables (Miller;, 2013; Whitehill, 2002). Although important variables at activity and participation level, like comprehensibility (Yorkston et al., 1996), acceptability (Strömbergsson, Edlund, McAllister & Lagerberg, 2020) and ease of listening (EOL) (Landa et al., 2014) are closely related to intelligibility, they are important to keep apart from each other. One way to address validity and reliability is to measure intelligibility as the proportion of words or syllables correctly understood by the listener (Hustad et al., 2015; Kent et al., 1994; Klein & Flint, 2006; Miller, 2013), often based on pre-determined lists of single words or a specific text that is read aloud or repeated after a model by the speaker. However, although these methods give the advantage of knowing the target word, they do not necessarily represent the speaker's ability to communicate efficiently in daily situations. For this reason, continuous speech produced spontaneously would be preferred. However, if the speech is severely impaired, it might not be possible to know the intended target words (Kwiatkowski & Shriberg, 1992). To address this problem when measuring intelligibility in spontaneous speech, a method using a master transcript has been developed; this method can be considered a gold standard (Gordon-Brannan & Hodson, 2000; Kwiatkowski & Shriberg, 1992). A master transcript is produced by the researcher who made the recording, together with the caregivers, as soon as possible after the recording session, in order to capture as much as possible of what was said. Transcripts produced by other listeners can then be compared to this master transcript, allowing calculation of the percentage of words or syllables correctly understood (Kwiatkowski & Shriberg, 1992). The drawback of this method is that it is very time consuming and labour-intensive to produce the master transcript, and – even after consulting with caregivers – the speaker's intended message may still not be clear (Lagerberg, Åsberg, Hartelius & Persson, 2014). Hence, a method that is easier to use would be of value, for clinical use and in research, provided that it is reliable and valid.
An additional challenge when dealing with unintelligible speech is that it might be impossible to identify word boundaries and, hence, to know how many words an utterance contains (Flipsen, 2006). Consequently, the percentage of words correctly understood might be misleading. To address this challenge, Flipsen (2006) has suggested to estimate the number of words from the number of syllables, using a syllable per word index. Another possibility to avoid this problem might be to count syllables instead of words in both intelligible and unintelligible parts of an utterance (Lagerberg et al., 2014). Throughout this manuscript, we will refer to this measure as syllables correctly understood (SCU).
The Weiss intelligibility test (Weiss,1982) includes assessment of intelligibility in isolated words and spontaneous speech. By adding the scores from these two speech tasks and dividing the sum by two, an overall intelligibility score is obtained. The calculation of intelligibility in spontaneous speech in this particular test is made without using a master transcript. The child talks about one of six pictures that he/she choses, about 200 words of spontaneous speech is audio recorded. The clinician is instructed to give as little verbal input as possible. The listener uses a grid with each box representing a word and marks with “√” for each word that was understood and leaves the box empty for words that are not understood. The listener can be “the clinician or any other listener” (p. 9) and guessing is not allowed. The scoring procedure is made directly from listening to the audio recording or by first making an orthographic transcription. The overall intelligibility score is the mean of the score from the spontaneous speech task (percentage of words understood) and percentage of single words understood using a picture naming task (Weiss, 1982). This method is also discussed in Kent et al. (1994), however the investigation of inter- and intra-listener reliability that Weiss made was not that comprehensive. By starting from this method, Lagerberg et al. (2014) suggested an intelligibility score based on the amount of syllables perceived as understood by the listener. This method has been evaluated regarding validity against the percentage of consonants correct (PCC) and found to have high validity and reliability in terms of inter-listener reliability (ICC single measures = 0.71 and ICC average measures = 0.91) and intra-listener reliability (r = 0.94, p < .01) (Lagerberg et al., 2014). The study included speech samples from 10 children with speech sound disorders (SSD) and 10 children with typical development of speech and language. The 20 listeners were mostly SLP students, but two were recent graduates. In this validation, the PCC scores were based on speakers’ articulation of single words, and the intelligibility scores based on the speakers’ continuous speech samples. Hence, it would be desirable to re-assess the validation in two respects: a) by relating the suggested intelligibility score to the ‘gold standard’ measure of intelligibility that is, the proportion of syllables correctly understood (SCU), and b) by relating the suggested intelligibility score to a measure of speech accuracy in the same speech sample.
In the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders, DSM-5 (American Psychiatric Association, 2013), speech sound disorders (SSDs) are described as conditions where speech is impaired to the degree that intelligibility is reduced. As such, reduced intelligibility is a functional consequence of impaired speech in SSD. Using the terminology of the International Classification of Functional Disability and Health, ICF-CY (WHO 2007) the SSD, by reducing the intelligibility, causes a limitation in activity (McCormack, McLeod, McAllister & Harrison, 2010; Pennington et al., 2013). Children with SSD have also been found to have limitations in participation and can be subject to negative attitudes from the environment and communication partners (McCormack et al., 2010.) This further clarifies the value of addressing intelligibility both in assessment and intervention for children with SSD as has been highlighted in several studies over recent years (Hustad et al., 2015; Landa et al., 2014; Lousada, Jesus, Hall & Joffe, 2014).
Validity can be investigated by comparing results from the new assessment method with results from an already well-established assessment method, that is, congruent validity (Steiner & Norman, 2008). Many studies have validated measures of intelligibility with reference to articulation measures (McLeod et al., 2015; Morris, Wilcox & Schooling, 1995; Zajac, Plante, Lloyd & Haley, 2011), or have been based on comparisons between different speech tasks, for example, compared results from single words with results from spontaneous speech (Hodge & Gotzke, 2007; Lagerberg et al., 2014). In the present study, we aim for the closest possible match to the method we seek to investigate and therefore use spontaneous speech as the common basis. Furthermore, we used a method that is considered the gold standard for assessing intelligibility. A second type of validity is convergent validity where results are compared to measurements of a variable that are related to the target variable (Streiner & Norman, 2008).
To conclude, intelligibility is central to the care of children with SSDs, and should be assessed regularly and reliably both in a clinical context and in research. This calls for reliable and valid assessment methods that require a reasonable time and effort. Therefore, the aim of the present study is to evaluate the validity and reliability of the method proposed by Lagerberg et al. (2014), based on the amount of syllables perceived as understood (SPU).
The research questions are:
- 1
Does the SPU (syllables perceived as understood) method exhibit congruent validity, as measured against the gold standard measure of intelligibility that is, the proportion of syllables correctly understood (SCU)?
- 2
Does the SPU method exhibit convergent validity, as measured against speech accuracy in the same speech sample?
- 3
Is the SPU-based intelligibility measure reliable when it comes to inter-listener reliability?
Section snippets
Method
The project is part of a larger project, "Real-time assessments of intelligibility in children's connected speech", which has been approved by the Regional Ethical Review Board in Stockholm (No. 2016/1628–31/1).
Congruent validity of assessment by syllables perceived as understood (SPU)
The correlation between the SPU-based intelligibility score and the gold standard SCU-based intelligibility score was strong and statistically significant, r = 0.84, p < .001, as calculated by a Spearman correlation, confirming the congruent validity of the SPU-based intelligibility score.
Despite the strong correlation between the SPU-based intelligibility score and the SCU-based intelligibility score, there was a significant difference between the two (asymptotic Z = −3.517, p < .001), such
Discussion
The aim of the present study was to investigate a method for assessment of intelligibility that is relatively time- and labour-efficient while at the same time meets requirements of reliability and validity, for use both in clinical work and in research. The proposed method of Syllables Perceived as Understood (SPU), where the creation of a master transcript is not necessary, proved to have high validity in terms of construct validity – and reliability was poor to excellent as analysed by
CRediT authorship contribution statement
Tove B. Lagerberg: Conceptualization, Methodology, Writing - original draft, Data curation, Formal analysis, Writing - review & editing. Katarina Holm: Investigation, Writing - review & editing. Anita McAllister: Writing - review & editing. Sofia Strömbergsson: Conceptualization, Methodology, Data curation, Formal analysis, Supervision, Funding acquisition, Writing - review & editing.
Declaration of Competing Interest
The authors report no declarations of interest.
Acknowledgements
The authors wish to thank Emil Brynte, MSc, SLP for valuable contributions to the data collection.
References (51)
Speaker–listener familiarity: Parents as listeners of delayed speech intelligibility
Journal of Communication Disorders
(1995)- et al.
A guideline of selecting and reporting Intraclass Correlation Coefficients for reliability research
Journal of Chiropractic Medicine
(2016) - et al.
A comparison of techniques for measuring intelligibility of dysarthric speech
Journal of communication disorders
(1978) - et al.
Impact of sentence length and phonetic complexity on intelligibility of 5-year-old children with cerebral palsy
International Journal of Speech-Language Pathology
(2014) Diagnostic and statistical manual of mental disorders
(2013)- et al.
Rating deaf speakers’ comprehensibility: An exploratory investigation
American Journal of Speech-Language Pathology
(1993) - et al.
Speech intelligibility in dysarthrias: Influence of utterance length
Folia Phoniatrica et Logopaedica
(2020) Measuring the intelligibility of conversational speech in children
Clinical Linguistics & Phonetics
(2006)- et al.
Intelligibility/severity measurements of prekindergarten children's speech
American Journal of Speech-Language Pathology
(2000) - et al.
Speech-Language Pathologists’ use of intelligibility measures in adults with dysarthria
American Journal of Speech-Language Pathology
(2017)
Preliminary results of an intelligibility measure for English-speaking children with cleft palate
Cleft Palate Craniofacial Journal
Estimating the intelligibility of speakers with dysarthria
Folia Phoniatrica et Logopaedica
Effects of speech stimuli and dysarthria severity on intelligibility scores and listener confidence ratings for speakers with cerebral palsy
Folia Phoniatrica et Logopaedica
Effects of presentation mode and repeated familiarization on intelligibility of dysarthric speech
American Journal of Speech- Language Pathology
Use of listening strategies for the speech of individuals with dysarthria and cerebral palsy
Augmentative and Alternative Communication
Variability and diagnostic accuracy of speech intelligibility scores in children
Journal of Speech Language and Hearing Research
Assessing intelligibility by single words, sentences and spontaneous speech: A methodological study of the speech production of 10-year olds
Logopedics, Phoniatrics, Vocology
Hearing and believing: Some limits to the auditory–perceptual assessment of speech and voice disorders
American Journal of Speech-Language Pathology
The intelligibility of children's speech: A review of evaluation procedures
American Journal of Speech- Language Patholology
Toward phonetic intelligibility testing in dysarthria
Journal of Speech and Hearing Disorders
Measurement of intelligibility in disordered speech
Language Speech and Hearing Services in Schools
Intelligibility assessment in developmental phonological disorders: Accuracy of caregiver gloss
Journal of Speech and Hearing Research
Swedish Test of Intelligibility for Children (STI-CH)—Validity and reliability of a computer-mediated single-word intelligibility test for children
Clinical Linguistics and Phonetics
Effect of number of repetitions on listener transcriptions in assessment of speech intelligibility in children
International Journal of Language & Communication Disorders
Assessment of intelligibility using children's spontaneous speech: Methodological aspects
International Journal of Language & Communication Disorders
Cited by (3)
Peer attitudes towards adolescents with speech disorders due to cleft lip and palate
2023, International Journal of Pediatric OtorhinolaryngologyUnderstanding acceptability of disordered speech through Audience Response Systems-based evaluation
2021, Speech CommunicationCitation Excerpt :Following the procedure described in Gordon-Brannan and Hodson (2000), the ‘gold standard’ intelligibility score (INTGS) for each sample was derived as the median proportion of correctly transcribed syllables for this sample, evaluated against a ‘master transcript’. As presented in Lagerberg et al. (2021) and Strömbergsson (2020), the reliability of the INTGS score – as measured by inter- and intra-judge agreement in the counting of correct syllables in orthographic listener transcriptions – was excellent (intra-judge agreement: .998; inter-judge agreement: .995). In both rating-based and ARS-based assessments, four of the 16 samples were duplicated, to form a 20-item listening script, compiled in pseudo-random order, with duplicated samples never immediately following the first playback of a sample.
Can We Use Speaker Embeddings On Spontaneous Speech Obtained From Medical Conversations To Predict Intelligibility?
2023, 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023