ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Modular Multi-Modal Attention Network for Alzheimer’s Disease Detection Using Patient Audio and Language Data

Ning Wang, Yupeng Cao, Shuai Hao, Zongru Shao, K.P. Subbalakshmi

In this work, we propose a modular multi-modal architecture to automatically detect Alzheimer’s disease using the dataset provided in the ADReSSo challenge. Both acoustic and text-based features are used in this architecture. Since the dataset provides only audio samples of controls and patients, we use Google cloud-based speech-to-text API to automatically transcribe the audio files to extract text-based features. Several kinds of audio features are extracted using standard packages. The proposed approach consists of 4 networks: C-attention-acoustic network (for acoustic features only), C-Attention-FT network (for linguistic features only), C-Attention-Embedding network (for language embeddings and acoustic embeddings), and a unified network (uses all of those features). The architecture combines attention networks and a convolutional neural network (C-Attention network) in order to process these features. Experimental results show that the C-Attention-Unified network with Linguistic features and X-Vector embeddings achieves the best accuracy of 80.28% and F1 score of 0.825 on the test dataset.


doi: 10.21437/Interspeech.2021-2024

Cite as: Wang, N., Cao, Y., Hao, S., Shao, Z., Subbalakshmi, K.P. (2021) Modular Multi-Modal Attention Network for Alzheimer’s Disease Detection Using Patient Audio and Language Data. Proc. Interspeech 2021, 3835-3839, doi: 10.21437/Interspeech.2021-2024

@inproceedings{wang21ca_interspeech,
  author={Ning Wang and Yupeng Cao and Shuai Hao and Zongru Shao and K.P. Subbalakshmi},
  title={{Modular Multi-Modal Attention Network for Alzheimer’s Disease Detection Using Patient Audio and Language Data}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={3835--3839},
  doi={10.21437/Interspeech.2021-2024}
}