Exploring a corpus-based approach for detecting language impairment in monolingual English-speaking children
Introduction
Language impairment (LI) is generally associated with children exhibiting significant deficits in different aspects of language, such as syntax or morphology. When there is no evidence of hearing impairment, neurological damage, or cognitive impairment, children are often labeled as having primary or specific language impairment. In this paper we focus on such children and they will be referred to as children with LI.
For assessing language development in monolingual English-speaking children, clinicians have a wide variety of language tests they can use: past tense task [1], third person singular task [2], clinical evaluation of language fundamentals 4th edition [3], non-word repetition [4], Wechsler intelligence scale for children [5], vocabulary test [6], peabody picture vocabulary test [7] and many more. However, the tests used for diagnosis of LI are clinician dependent. The general approach for the norm-referenced tests is to identify children with potential LI as those whose score is more than 1.25 standard deviations (SD) below the mean of the reference population on at least two of the measures (e.g., [8]). However, Campbell et al. [9] showed that norm-referenced tests are biased against test-takers whose population is not adequately represented in the reference population. Children from minority ethnic backgrounds and low socio-economic status belong to this category. To address the bias related issues, researchers have developed processing-dependent measures in contrast to the vocabulary based measures. These measures include non-word repetition [9], [10], [11] and competing language processing tasks [12].
An exploratory study evaluating Language Models for automatic LI identification showed promise for the adaptation of natural language processing (NLP) and machine learning (ML) techniques to this problem [13]. Later on, [14] presented an approach to LI identification that proposed various features inspired by the NLP and communication disorders literature. This paper is an extension to our previous studies. Here we explore new aspects of language, such as complexity and common error patterns. We use part of speech (POS) taggers trained on adult speech as well as children's speech to evaluate the importance of the POS tagger accuracy. We also perform feature selection in order to find the most relevant features. Along with the dataset used in our previous work, we use an additional dataset of young children to evaluate our approach. The results show that ML algorithms generally perform better than our defined baseline and language models. The results vary across datasets and the environment under which the conversations were collected. Better results are obtained for adolescents, presumably because their speech is more structured and contains fewer unintelligible words as compared to children's speech.
Section snippets
Related work
Language impairment (LI) is a disorder involving the processing of linguistic information [15]. LI has been associated with poor performance in educational and social environments [16]. Children with LI also have a higher risk of suffering reading disorders once they reach school age. Although intervention can improve specific language skills in children with LI [17], underlying language weaknesses appear to persist into adolescence and beyond [16], [18], [19], [20]. Early detection of LI is
Methods
Our task is to automatically predict the language status of children given orthographic transcripts of their audio-recorded utterances. In the NLP community, this can be viewed as the task of text classification. In this section we present our approach for this task using language models and machine learning algorithms.
Datasets
In this study we evaluated our proposed framework on two datasets: one consisting of language samples from adolescents and another one from children with an average age of 6 years. In addition to providing a source for evaluation and benchmarking purposes, each data uses a different elicitation task and thus provides different challenges. While the data from adolescents consists of highly structured narratives, the age 6 dataset uses a free play session to collect more free-style language
Experimental results
A LOOCV is performed for the Conti-Ramsden 4 dataset [50] and 10-fold cross validation is used for the Paradise dataset. We compare our performance results with the baseline method mentioned in Section 4.2.
LMs were trained with the Witten–Bell discounting method using SRILM [56]. For ML experiments, we use Weka [57] for its known reliability of implementations and the availability of a large number of algorithms. We use LibSVM [58] along with the wrapper script provided in Weka for evaluating
Feature analysis
With the goal of identifying important features and to better understand the contribution of each feature group we performed two different analyses, i.e., adding one feature group, and removing one feature group, in the ML environment. In the analysis of results involving the addition of one feature group, each feature group is used by itself for the classification task. In the analysis of results involving removing one feature group, we removed one feature group at a time from the entire
Error analysis
We obtained low performance for the personal narrative task in the Conti-Ramsden 4 dataset and even lower for the Paradise dataset as compared to the story telling task in the Conti-Ramsden 4 dataset. There are several possible reasons for the lower performance in these two tasks. One concerns the accuracy with which transcripts were identified as belonging to the TD or LI category. For the Conti-Ramsden 4 dataset, the samples were collected several years after children in the LI category had
Conclusions and future work
In this paper we explored a relatively new approach for contributing to a more accurate prediction of language status in children. Our approach includes the use of LMs followed by ML algorithms. For ML algorithms, we use features that represent complementary language skills, such as productivity, morphosyntactic skills, and sentence complexity. These features try to combine the efforts of researchers in the communities of NLP and communication disorders. We evaluated our approach on two
Acknowledgements
This research was supported by the National Science Foundation under grants 1017190 and 1018124. We would like to thank the reviewers for their thoughtful comments. The Paradise dataset used for these analyses were obtained originally in the course of a research project led by Jack L. Paradise, MD, and supported by grants from the National Institute of Child Health and Human Development, the Agency for Healthcare Research and Quality, and the National Institutes of Health General Clinical
References (63)
- et al.
Phonological memory deficits in language disordered children: is there a causal connection?
Journal of Memory and Language
(1990) - et al.
Morphological productivity in children with normal language and SLI: a study of the English past tense
Journal of Speech, Language, and Hearing Research
(1999) - et al.
Non-word repetition and grammatical morphology: normative data for children in their final year of primary school
International Journal of Language Communication Disorders
(2001) - et al.
Clinical evaluation of language fundamentals (CELF-4)
(2003) Wechsler intelligence scale for children
(1991)Expressive vocabulary test
(1997)- et al.
Peabody picture vocabulary test
(1981) - et al.
Prevalence of specific language impairment in kindergarten children
Journal of Speech, Language, and Hearing Research
(1997) - et al.
Reducing bias in language assessment: processing-dependent measures
Journal of Speech, Language, and Hearing Research
(1997) - et al.
Nonword repetition as a behavioural marker for inherited language impairment: evidence from a twin study
Journal of Child Psychology and Psychiatry
(1996)
Nonword repetition and child language impairment
Journal of Speech, Language, and Hearing Research
Procedure for assessing verbal working memory in normal school-age children: some preliminary data
Perceptual and Motor Skills
Using language models to identify language impairment in Spanish-English bilingual children
A corpus-based approach for the prediction of language impairment in monolingual English and Spanish-English bilingual children
Speed of processing, working memory, and language impairment in children
Journal of Speech, Language, and Hearing Research
Modeling developmental language difficulties from school entry into adulthood: literacy, mental health and employment outcomes
Journal of Speech, Language, and Hearing Research
The efficacy of treatment for children with developmental speech and language delay/disorder: a meta-analysis
Journal of Speech, Language, and Hearing Research
Fourteen-year follow-up of children with and without speech/language impairments: speech/language stability and outcomes
Journal of Speech, Language, and Hearing Research
The relationship between the natural history and prevalence of primary speech and language delays: findings from a systematic review of the literature
International Journal of Language and Communication Disorders
Age 17 language and reading outcomes in late-talking toddlers: support for a dimensional perspective on language delay
Journal of Speech, Language, and Hearing Research
Ch. 5: information processing in children with specific language impairment
Classification of developmental language disorders: theoretical issues and clinical implications
A system for the diagnosis of specific language impairment in kindergarten children
Journal of Speech and Hearing Research
The application of dynamic methods to language assessment
The Journal of Special Education
Dynamic assessment: the model, its relevance as a nonbiased approach, and its application to Latino american preschool children
Language, Speech, and Hearing Services in Schools
Reducing test bias through dynamic assessment of children's word learning ability
American Journal of Speech-Language Pathology
Eligibility criteria for language impairment: is the low end of normal always appropriate?
Language, Speech, and Hearing Services in Schools
Extending use of the NRT to preschool-age children with and without specific language impairment
Language, Speech, and Hearing Services in Schools
The diagnosic accuracy and construct validity of the structured photographic expressive language test-preschool: second edition
Language, Speech, and Hearing Services in Schools
Selection of preschool language tests: a data-based approach
Language, Speech, and Hearing Services in Schools
Grammatical characteristics of Swedish children with SLI
Journal of Speech and Hearing Research
Characterizing language impairment in children: an exploratory study
Language Testing
Cited by (12)
One-dimensional convolutional neural network and hybrid deep-learning paradigm for classification of specific language impaired children using their speech
2022, Computer Methods and Programs in BiomedicineCitation Excerpt :Further, the sample size of their study was too small. Authors from [8,9] utilized the concepts of language model (LM), an orthodox paradigm to solve the natural language processing (NLP) problems. In this method, a threshold value was decided.
Situation awareness in the speech therapy domain: A systematic mapping study
2019, Computer Speech and LanguageCitation Excerpt :In addition, the inference system uses the computed metrics to provide automatic health reports for SLPs and/or participants (Explanation and Exploration). Gabani et al. (2011) explore the use of an automated method to analyze children's narratives in order to identify the presence or absence of language impairment. The task is to automatically predict children's language status (Projection) from orthographic transcripts of their audio-recorded statements.
A methodology for the characterization and diagnosis of cognitive impairments-Application to specific language impairment
2014, Artificial Intelligence in MedicineCitation Excerpt :It is worth remarking that, in this case, it is more useful to obtain good results in terms of sensitivity because diagnosing an child with impairment as normal is much more problematic than the other way round. Therefore, we can conclude, similar to some previous works [26,27], that the use of machine-learning techniques can be of great interest for the diagnosis of cognitive impairments and, in particular, for the diagnosis of SLI. The other main objective of this experiment is to show that the diagnosis of cognitive impairments could be improved by using the information obtained from computational cognitive modeling.
A case-based system architecture based on situation-awareness for speech therapy
2018, ICEIS 2018 - Proceedings of the 20th International Conference on Enterprise Information SystemsAutomated morphological analysis of clinical language samples
2015, 2nd Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, CLPsych 2015 - Proceedings of the Workshop