poster

The SRI AVEC-2014 Evaluation System

Authors:
Vikramjit Mitra

SRI International, Menlo Park, CA, USA

SRI International, Menlo Park, CA, USA
View Profile

,
Elizabeth Shriberg

SRI International, Menlo Park, CA, USA

SRI International, Menlo Park, CA, USA
View Profile

,
Mitchell McLaren

SRI International, Menlo Park, CA, USA

SRI International, Menlo Park, CA, USA
View Profile

,
Andreas Kathol

SRI International, Menlo Park, CA, USA

SRI International, Menlo Park, CA, USA
View Profile

,
Colleen Richey

SRI International, Menlo Park, CA, USA

SRI International, Menlo Park, CA, USA
View Profile

,
Dimitra Vergyri

SRI International, Menlo Park, CA, USA

SRI International, Menlo Park, CA, USA
View Profile

,
Martin Graciarena

SRI International, Menlo Park, CA, USA

SRI International, Menlo Park, CA, USA
View Profile

AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion ChallengeNovember 2014Pages 93–101https://doi.org/10.1145/2661806.2661818

Published:07 November 2014Publication History

AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

Pages 93–101

ABSTRACT

Though depression is a common mental health problem with significant impact on human society, it often goes undetected. We explore a diverse set of features based only on spoken audio to understand which features correlate with self-reported depression scores according to the Beck depression rating scale. These features, many of which are novel for this task, include (1) estimated articulatory trajectories during speech production, (2) acoustic characteristics, (3) acoustic-phonetic characteristics and (4) prosodic features. Features are modeled using a variety of approaches, including support vector regression, a Gaussian backend and decision trees. We report results on the AVEC-2014 depression dataset and find that individual systems range from 9.18 to 11.87 in root mean squared error (RMSE), and from 7.68 to 9.99 in mean absolute error (MAE). Initial fusion brings further improvement; fusion and feature selection work is still in progress.

References

American Psychiatric Association, Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision, Washington, DC, American Psychiatric Association, 2000.Google Scholar
M.M.Weissman, S. Wolk, R.B. Goldstein, D. Moreau, P. Adams, S. Greenwald, C.M. Klier, N.D. Ryan, R.E. Dahl, P. Wichramaratne, "Depressed adolescents grown up," Journal of the American Medical Association, 1999; 281(18):1701--1713.Google Scholar
J. March, S. Silva, S. Petrycki, J. Curry, K. Wells, J. Fairbank, B. Burns, M. Domino, S. McNulty, B. Vitiello, J. Severe, "Treatment for Adolescents with Depression Study (TADS) team. Fluoxetine, cognitive-behavioral therapy, and their combination for adolescents with depression: Treatment for Adolescents with Depression Study (TADS) randomized controlled trial," Journal of the American Medical Association, 2004; 292(7):807--820.Google Scholar
J.A. Bridge, S. Iyengar, C.B. Salary, R.P. Barbe, B. Birmaher, H.A. Pincus, L. Ren, D.A. Brent, "Clinical response and risk for reported suicidal ideation and suicide attempts in pediatric antidepressant treatment, a meta-analysis of randomized controlled trials," Journal of the American Medical Association, 2007; 297(15):1683--1696.Google ScholarCross Ref
J. Darby and H. Hollien, "Vocal and speech patterns of depressive patients," Folia phoniat, vol. 29, pp. 279--291, 1977.Google ScholarCross Ref
J. Darby, N. Simons, and P. Berger, "Speech and voice parameters of depression: A pilot study," J. Commun. Disorders , vol. 17, pp. 75--85, 1984.Google ScholarCross Ref
A. Ozdas, R. G. Shiavi, D. M. Wilkes, M. K. Silverman, and S. E. Silverman, "Analysis of vocal tract characteristics for near-term suicidal risk assessment," Methods of Information in Medicine, vol. 43, pp. 36--38, 2004.Google ScholarCross Ref
A. Ozdas, R. G. Shiavi, S. E. Silverman, M. K. Silverman, and D. M. Wilkes, "Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk," IEEE Transactions on Biomedical Engineering, vol. 51, no. 9, pp. 1530--1540, September 2004.Google ScholarCross Ref
M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, M. Pantic, "AVEC 2013 The Continuous Audio/Visual Emotion and Depression Recognition Challenge," Proc. of AVEC 2013. Google ScholarDigital Library
L. A. Low, N. C. Maddage, M. Lech, L. Sheeber, and N. Allen, "Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents," in IEEE Conference on Acoustics, Speech, and Signal Processing , Dallas, TX, USA, 2010, pp. 5154--5157.Google Scholar
H. K. Keskinpala, T. Yingtha wornsuk, D. M. Wilkes, R. G. Shiavi, and R. M. Salomon, "Screening for high risk suicidal states using mel-cepstral coefficients and energy in frequency bands," in European Signal Processing Conference, Poznan, Poland, 2007, pp. 2229--2233.Google Scholar
D. J. France, R. G. Shiavi, S. Silverman, M. Silverman, and D. M. Wilkes, "Acoustical properties of speech as indicators of depression and suicidal risk," IEEE Transactions on Biomedical Engineering, vol. 47, no. 7, pp. 829--837, July 2000.Google ScholarCross Ref
E. M. II, M. A. Clements, J. W. Peifer, and L. Weisser, "Criticalanalysis of the impact of glottal features in the classification of clinical depression in speech," IEEE Transactions on Biomedical Engineering, vol. 55, no. 1, pp. 96--107, January 2008.Google ScholarCross Ref
J. F. Cohn, T. S. Kruez, I. Matthews, Y. Yang, M. H. Nguyen, M. T. Padilla, F. Zhou, and F. D. la Torre, "Detecting depression from facial actions and vocal prosody," in International Conference on Affective Computing and Intelligent Interaction, 2009.Google Scholar
T. Yingthawornsuk and R. G. Shiavi, "Distinguishing depression and suicidal risk in men using GMM based frequency contents of affective vocal tract response," in International Conference on Control, Automation and Systems, Seoul, Korea, 2008, pp. 901--904.Google Scholar
J. R. Williamson, R. Horwitz, T.F. Quatieri, B. Yu, B. S. Helfer, D. D. Mehta, "Vocal Biomarkers of Depression Based on Motor Incoordination," Proc. of AVEC 2013. Google ScholarDigital Library
N. Cummins, V. Sethu, J. Joshi, R. Goecke, A. Dhall, J. Epps "Diagnosis of Depression by Behavioural Signals: A Multimodal Approach," Proc. of AVEC 2013. Google ScholarDigital Library
H. Meng, H. Wang, H. Yang, M. Al-Shuraifi, Y. Wang, "Depression Recognition based on Dynamic Facial and Vocal Expression Features using Partial Least Square Regression," Proc. of AVEC 2013. Google ScholarDigital Library
B. Siddiquie, S. Khan, A. Divakaran, H. Sawhney "Affect Analysis in natural human interaction using joint hidden conditional random fields," Proc of ICME 2013.Google Scholar
M. Amer, B. Siddiquie, S. Khan, A. Divakaran, H. Sawhney "Multimodal Fusion using Dynamic Hybrid Models", Proc. of WACV 2014.Google Scholar
D. Maust, M. Cristancho, L. Gray, S. Rushing, C. Tjoa, and M. E. Thase, "Chapter 13 - Psychiatric rating scales," in Handbook of Clinical Neurology, vol. Volume 106, F. B. Michael J. Aminoff and F. S. Dick, Eds. Elsevier, 2012, pp. 227--237.Google Scholar
M. H. Sanchez, D. Vergyri, L. Ferrer, C. Richey, P. Garcia, B. Knoth, W. Jarrold, "Using Prosodic and Spectral Features in Detecting Depression in Elderly Males", Proc. of Interspeech, 2011.Google Scholar
M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, M. Pantic "AVEC 2014-3D Dimensional Affect and Depression Recognition Challenge," Proc. of AVEC2014 Google ScholarDigital Library
A. Beck, R. Steer, R. Ball, and W. Ranieri, "Comparison of beck depression inventories -ia and -ii in psychiatric outpatients. Journal of Personality Assessment, 67(3):588{97, December 1996.Google ScholarCross Ref
V. Mitra, H. Franco, M. Graciarena, "Damped Oscillator Cepstral Coefficients for Robust Speech Recognition," Proc. of Interspeech, pp. 886--890, 2013.Google Scholar
V. Mitra, H. Franco, M. Graciarena, A. Mandal, "Normalized Amplitude Modulation Features for Large Vocabulary Noise-Robust Speech Recognition," Proc. of ICASSP, pp. 4117--4120, 2012.Google Scholar
R. Drullman, J.M. Festen, R. Plomp, "Effect of Reducing Slow Temporal Modulations on Speech Reception," J. Acoust. Soc. of Am., Vol. 95, No. 5, pp. 2670--2680, 1994.Google ScholarCross Ref
V. Ghitza, "On the Upper Cutoff Frequency of Auditory Critical-Band Envelope Detectors in the Context of Speech Perception," J. Acoust. Soc. of America, vol. 110, no. 3, pp. 1628--1640, 2001.Google ScholarCross Ref
P. Maragos, J. Kaiser, T. Quatieri, "Energy Separation in Signal Modulations with Application to Speech Analysis," IEEE Trans. Signal Processing, Vol. 41, pp. 3024--3051, 1993. Google ScholarDigital Library
M. McLaren, N. Scheffer, M. Graciarena, L. Ferrer and Y. Lei, "Improving speaker identification robustness to highly channel-degraded speech through multiple system fusion", in proc. of ICASSP 2013.Google ScholarCross Ref
A. Lawson, M. McLaren, Y. Lei, V. Mitra, N. Scheffer, L. Ferrer, M. Graciarena, "Improving Language Identification Robustness to Highly Channel-Degraded Speech Through Multiple System Fusion," in Proc. of Interspeech, pp. 1507--1510, Lyon, 2013.Google Scholar
V. Mitra, M. McLaren, H. Franco, M. Graciarena, N. Scheffer, "Modulation Features for Noise Robust Speaker Identification," Proc. of Interspeech, pp. 3703--3707, 2013.Google Scholar
V. Mitra, H. Franco, M. Graciarena, D. Vergyri, "Medium duration modulation cepstral feature for robust speech recognition," Proc. of ICASSP, pp. 1768--1772, Florence, 2014.Google Scholar
H. Teager, "Some Observations on Oral Air Flow during Phonation," IEEE Trans. ASSP, pp. 599--601, 1980.Google ScholarCross Ref
V. Mitra, G. Sivaraman, H. Nam, C. Espy-Wilson, E. Saltzman, "Articulatory features from deep neural networks and their role in speech recognition," Proc. of ICASSP, pp.3041--3045, Florence, 2014.Google Scholar
V. Mitra, H. Nam, C. Espy-Wilson, E. Saltzman, L. Goldstein, "Articulatory Information for Noise Robust Speech Recognition," IEEE Trans. on ASLP, Vol. 19, Iss. 7, pp. 1913--1924, 2010. Google ScholarDigital Library
H. Nam, L. Goldstein, E. Saltzman, D. Byrd, "TADA: An enhanced, Portable Task Dynamics Model in Matlab," J. of Acoust. Soc. Am., 115(5), p. 2430, 2004.Google ScholarCross Ref
E. Shriberg, A. Stolcke, S. Ravuri, "Addressee Detection for Dialog Systems Using Temporal and Spectral Dimensions of Speaking Style," Proc. of Interspeech, 2013.Google Scholar
P. Boersma, D. Weenink, "Praat: doing phonetics by computer," Version 5.1.05, url: http://www.praat.org/, 2009Google Scholar
N.C. Yoder, "Peak Finder," Matlab program, url: http://www.mathworks.com/matlabcentral/fileexchange/25500-peakfinder, 2011.Google Scholar
P. Ghahremani, B. BabaAli, D. Povey, K. Riedhammer, J. Trmal and S. Khudanpu "A Pitch Extraction Algorithm Tuned for Automatic Speech Recognition," in Proc. of ICASSP, 2014.Google Scholar
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al., "The kaldi speech recognition toolkit," in Proc. ASRU, 2011.Google Scholar
A. Juneja, "Speech recognition based on phonetic features and acoustic landmarks", PhD thesis, University of Maryland College Park, December 2004. Google ScholarDigital Library
O. Deshmukh, J. Singh, C. Espy-Wilson. 2004. "A novel method for computation of periodicity, aperiodicity and pitch of speech signals," Proceedings of the 34th International Conference on Acoustics, Speech and Signal Processing, 17-21 May, Montreal, Canada, pp. 117--20.Google Scholar
T. Pruthi, C. Espy-Wilson, "Acoustic parameters for the automatic detection of vowel nasalization," Proceedings of INTERSPEECH, pp. 1925--1928, 2007.Google Scholar
N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, "Front-end factor analysis for speaker verification," IEEE Trans. on Speech and Audio Processing, 2011, 19, 788--798. Google ScholarDigital Library
D. Martınez, O. Plchot, L. Burget, O. Glembek, P. Matejka, "Language recognition in ivectors space." Proceedings of Interspeech, Italy, 861--864, 2011.Google Scholar
McLaren M.; Scheffer N.; Ferrer L. & Lei, Y. "Effective use of DCTs for Contextualizing Features for Speaker Recognition," Proc. ICASSP, 2014.Google Scholar
H. Drucker, C.J. Burges, L. Kaufman, A. Smola, V. Vapnik, "Support vector regression machines. Advances in neural information processing systems," 9, pp. 155--161, 1997Google Scholar
M. H. Bahari, M. McLaren, H. van hamme, and D. A. van Leeuwen. "Age estimation from telephone speech using i-vectors," in Proc. of InterSpeech 2012, 2012.Google Scholar
Pedregosa et al. "Scikit-learn: Machine Learning in Python," JMLR 12, pp. 2825--2830, 2011. url: http://scikit-learn.org Google ScholarDigital Library
L. Ferrer, L. Burget, O. Plchot, and N. Scheffer, "A unified approach for audio characterization and its application to speaker recognition," in Proc. of the Speaker and Language Recognition Workshop, Odyssey 2010, Brno, Czech Republic, Jun. 2010.Google Scholar
F. Eyben, M. Wöllmer, B. Schuller: "openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor", Proc. ACM Multimedia (MM), ACM, Florence, Italy, ISBN 978--1--60558--933--6, pp. 1459--1462, 25.-29.10.2010. Google ScholarDigital Library
K. Subrahmanyam, N. Shiva Sankar, S. Praveen Baggam, R. Rao S, "A Modified KS - test for Feature Selection," IOSR Journal of Computer Engineering, e-ISSN: 2278-0661, p-ISSN: 2278--8727, Vol. 13, Iss. 3, pp. 73--79, 2013.Google Scholar

Index Terms

The SRI AVEC-2014 Evaluation System
1. Computing methodologies
  1. Machine learning
2. Hardware
  1. Communication hardware, interfaces and storage
    1. Signal processing systems

Recommendations

Articulatory and excitation source features for speech recognition in read, extempore and conversation modes

In our previous works, we have explored articulatory and excitation source features to improve the performance of phone recognition systems (PRSs) using read speech corpora. In this work, we have extended the use of articulatory and excitation source ...
Read More
Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks

Prosody plays an important role in improving the quality of text-to-speech synthesis (TTS) system. In this paper, features related to the linguistic and the production constraints are proposed for modeling the prosodic parameters such as duration, ...
Read More
Detecting Depression Severity from Vocal Prosody

To investigate the relation between vocal prosody and change in depression severity over time, 57 participants from a clinical trial for treatment of depression were evaluated at seven-week intervals using a semistructured clinical interview for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge
November 2014
110 pages
ISBN:9781450331197
DOI:10.1145/2661806
General Chairs:
Michel Valstar
University of Nottingham, UK
,
Björn Schuller
Technische Universität Münich/Imperial College London, DE/UK
,
Jarek Krajewski
University of Wuppertal, Germany
,
Roddy Cowie
Queen's University Belfast, UK
,
Maja Pantic
Imperial College London/Twente University, UK/The Netherlands
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
acoustic features
articulatory features
decision trees
depression
prosody
robust signal analysis
support vector regression
time series prediction
Qualifiers
- poster
Conference

Acceptance Rates
AVEC '14 Paper Acceptance Rate8of22submissions,36%Overall Acceptance Rate52of98submissions,53%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 340
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The SRI AVEC-2014 Evaluation System

AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

ABSTRACT

References

Cited By

Index Terms

Recommendations

Articulatory and excitation source features for speech recognition in read, extempore and conversation modes

Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks

Detecting Depression Severity from Vocal Prosody

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The SRI AVEC-2014 Evaluation System

AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

ABSTRACT

References

Cited By

Index Terms

Recommendations

Articulatory and excitation source features for speech recognition in read, extempore and conversation modes

Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks

Detecting Depression Severity from Vocal Prosody

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media