ISCA Archive Interspeech 2020
ISCA Archive Interspeech 2020

Hybrid Network Feature Extraction for Depression Assessment from Speech

Ziping Zhao, Qifei Li, Nicholas Cummins, Bin Liu, Haishuai Wang, Jianhua Tao, Björn W. Schuller

A fast-growing area of mental health research is the search for speech-based objective markers for conditions such as depression. One vital challenge in the development of speech-based depression severity assessment systems is the extraction of depression-relevant features from speech signals. In order to deliver more comprehensive feature representation, we herein explore the benefits of a hybrid network that encodes depression-related characteristics in speech for the task of depression severity assessment. The proposed network leverages self-attention networks (SAN) trained on low-level acoustic features and deep convolutional neural networks (DCNN) trained on 3D Log-Mel spectrograms. The feature representations learnt in the SAN and DCNN are concatenated and average pooling is exploited to aggregate complementary segment-level features. Finally, support vector regression is applied to predict a speaker’s Beck Depression Inventory-II score. Experiments based on a subset of the Audio-Visual Depressive Language Corpus, as used in the 2013 and 2014 Audio/Visual Emotion Challenges, demonstrate the effectiveness of our proposed hybrid approach.


doi: 10.21437/Interspeech.2020-2396

Cite as: Zhao, Z., Li, Q., Cummins, N., Liu, B., Wang, H., Tao, J., Schuller, B.W. (2020) Hybrid Network Feature Extraction for Depression Assessment from Speech. Proc. Interspeech 2020, 4956-4960, doi: 10.21437/Interspeech.2020-2396

@inproceedings{zhao20h_interspeech,
  author={Ziping Zhao and Qifei Li and Nicholas Cummins and Bin Liu and Haishuai Wang and Jianhua Tao and Björn W. Schuller},
  title={{Hybrid Network Feature Extraction for Depression Assessment from Speech}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4956--4960},
  doi={10.21437/Interspeech.2020-2396}
}