Autism spectrum conditions (ASC) are a set of neuro-developmental conditions partly characterised by difficulties with communication. Individuals with ASC can show a variety of atypical speech behaviours, including echolalia or the `echoing' of another's speech. We herein introduce a new dataset of 15 Serbian ASC children in a human-robot interaction scenario, annotated for the presence of echolalia amongst other ASC vocal behaviours. From this, we propose a four-class classification problem and investigate the suitability of applying a 2D convolutional neural network augmented with a recurrent neural network with bidirectional long short-term memory cells to solve the proposed task of echolalia recognition. In this approach, log Mel-spectrograms are first generated from the audio recordings and then fed as input into the convolutional layers to extract high-level spectral features. The subsequent recurrent layers are applied to learn the long-term temporal context from the obtained features. Finally, we use a feed forward neural network with softmax activation to classify the dataset. To evaluate the performance of our deep learning approach, we use leave-one-subject-out cross-validation. Key results presented indicate the suitability of our approach by achieving a classification accuracy of 83.5% unweighted average recall.
Cite as: Amiriparian, S., Baird, A., Julka, S., Alcorn, A., Ottl, S., Petrović, S., Ainger, E., Cummins, N., Schuller, B. (2018) Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks. Proc. Interspeech 2018, 2334-2338, doi: 10.21437/Interspeech.2018-1772
@inproceedings{amiriparian18_interspeech, author={Shahin Amiriparian and Alice Baird and Sahib Julka and Alyssa Alcorn and Sandra Ottl and Sunčica Petrović and Eloise Ainger and Nicholas Cummins and Björn Schuller}, title={{Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks}}, year=2018, booktitle={Proc. Interspeech 2018}, pages={2334--2338}, doi={10.21437/Interspeech.2018-1772} }