skip to main content
10.1145/2661806.2661818acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
poster

The SRI AVEC-2014 Evaluation System

Published:07 November 2014Publication History

ABSTRACT

Though depression is a common mental health problem with significant impact on human society, it often goes undetected. We explore a diverse set of features based only on spoken audio to understand which features correlate with self-reported depression scores according to the Beck depression rating scale. These features, many of which are novel for this task, include (1) estimated articulatory trajectories during speech production, (2) acoustic characteristics, (3) acoustic-phonetic characteristics and (4) prosodic features. Features are modeled using a variety of approaches, including support vector regression, a Gaussian backend and decision trees. We report results on the AVEC-2014 depression dataset and find that individual systems range from 9.18 to 11.87 in root mean squared error (RMSE), and from 7.68 to 9.99 in mean absolute error (MAE). Initial fusion brings further improvement; fusion and feature selection work is still in progress.

References

  1. American Psychiatric Association, Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision, Washington, DC, American Psychiatric Association, 2000.Google ScholarGoogle Scholar
  2. M.M.Weissman, S. Wolk, R.B. Goldstein, D. Moreau, P. Adams, S. Greenwald, C.M. Klier, N.D. Ryan, R.E. Dahl, P. Wichramaratne, "Depressed adolescents grown up," Journal of the American Medical Association, 1999; 281(18):1701--1713.Google ScholarGoogle Scholar
  3. J. March, S. Silva, S. Petrycki, J. Curry, K. Wells, J. Fairbank, B. Burns, M. Domino, S. McNulty, B. Vitiello, J. Severe, "Treatment for Adolescents with Depression Study (TADS) team. Fluoxetine, cognitive-behavioral therapy, and their combination for adolescents with depression: Treatment for Adolescents with Depression Study (TADS) randomized controlled trial," Journal of the American Medical Association, 2004; 292(7):807--820.Google ScholarGoogle Scholar
  4. J.A. Bridge, S. Iyengar, C.B. Salary, R.P. Barbe, B. Birmaher, H.A. Pincus, L. Ren, D.A. Brent, "Clinical response and risk for reported suicidal ideation and suicide attempts in pediatric antidepressant treatment, a meta-analysis of randomized controlled trials," Journal of the American Medical Association, 2007; 297(15):1683--1696.Google ScholarGoogle ScholarCross RefCross Ref
  5. J. Darby and H. Hollien, "Vocal and speech patterns of depressive patients," Folia phoniat, vol. 29, pp. 279--291, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Darby, N. Simons, and P. Berger, "Speech and voice parameters of depression: A pilot study," J. Commun. Disorders , vol. 17, pp. 75--85, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  7. A. Ozdas, R. G. Shiavi, D. M. Wilkes, M. K. Silverman, and S. E. Silverman, "Analysis of vocal tract characteristics for near-term suicidal risk assessment," Methods of Information in Medicine, vol. 43, pp. 36--38, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  8. A. Ozdas, R. G. Shiavi, S. E. Silverman, M. K. Silverman, and D. M. Wilkes, "Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk," IEEE Transactions on Biomedical Engineering, vol. 51, no. 9, pp. 1530--1540, September 2004.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, M. Pantic, "AVEC 2013 The Continuous Audio/Visual Emotion and Depression Recognition Challenge," Proc. of AVEC 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. A. Low, N. C. Maddage, M. Lech, L. Sheeber, and N. Allen, "Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents," in IEEE Conference on Acoustics, Speech, and Signal Processing , Dallas, TX, USA, 2010, pp. 5154--5157.Google ScholarGoogle Scholar
  11. H. K. Keskinpala, T. Yingtha wornsuk, D. M. Wilkes, R. G. Shiavi, and R. M. Salomon, "Screening for high risk suicidal states using mel-cepstral coefficients and energy in frequency bands," in European Signal Processing Conference, Poznan, Poland, 2007, pp. 2229--2233.Google ScholarGoogle Scholar
  12. D. J. France, R. G. Shiavi, S. Silverman, M. Silverman, and D. M. Wilkes, "Acoustical properties of speech as indicators of depression and suicidal risk," IEEE Transactions on Biomedical Engineering, vol. 47, no. 7, pp. 829--837, July 2000.Google ScholarGoogle ScholarCross RefCross Ref
  13. E. M. II, M. A. Clements, J. W. Peifer, and L. Weisser, "Criticalanalysis of the impact of glottal features in the classification of clinical depression in speech," IEEE Transactions on Biomedical Engineering, vol. 55, no. 1, pp. 96--107, January 2008.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. F. Cohn, T. S. Kruez, I. Matthews, Y. Yang, M. H. Nguyen, M. T. Padilla, F. Zhou, and F. D. la Torre, "Detecting depression from facial actions and vocal prosody," in International Conference on Affective Computing and Intelligent Interaction, 2009.Google ScholarGoogle Scholar
  15. T. Yingthawornsuk and R. G. Shiavi, "Distinguishing depression and suicidal risk in men using GMM based frequency contents of affective vocal tract response," in International Conference on Control, Automation and Systems, Seoul, Korea, 2008, pp. 901--904.Google ScholarGoogle Scholar
  16. J. R. Williamson, R. Horwitz, T.F. Quatieri, B. Yu, B. S. Helfer, D. D. Mehta, "Vocal Biomarkers of Depression Based on Motor Incoordination," Proc. of AVEC 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Cummins, V. Sethu, J. Joshi, R. Goecke, A. Dhall, J. Epps "Diagnosis of Depression by Behavioural Signals: A Multimodal Approach," Proc. of AVEC 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Meng, H. Wang, H. Yang, M. Al-Shuraifi, Y. Wang, "Depression Recognition based on Dynamic Facial and Vocal Expression Features using Partial Least Square Regression," Proc. of AVEC 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Siddiquie, S. Khan, A. Divakaran, H. Sawhney "Affect Analysis in natural human interaction using joint hidden conditional random fields," Proc of ICME 2013.Google ScholarGoogle Scholar
  20. M. Amer, B. Siddiquie, S. Khan, A. Divakaran, H. Sawhney "Multimodal Fusion using Dynamic Hybrid Models", Proc. of WACV 2014.Google ScholarGoogle Scholar
  21. D. Maust, M. Cristancho, L. Gray, S. Rushing, C. Tjoa, and M. E. Thase, "Chapter 13 - Psychiatric rating scales," in Handbook of Clinical Neurology, vol. Volume 106, F. B. Michael J. Aminoff and F. S. Dick, Eds. Elsevier, 2012, pp. 227--237.Google ScholarGoogle Scholar
  22. M. H. Sanchez, D. Vergyri, L. Ferrer, C. Richey, P. Garcia, B. Knoth, W. Jarrold, "Using Prosodic and Spectral Features in Detecting Depression in Elderly Males", Proc. of Interspeech, 2011.Google ScholarGoogle Scholar
  23. M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, M. Pantic "AVEC 2014-3D Dimensional Affect and Depression Recognition Challenge," Proc. of AVEC2014 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Beck, R. Steer, R. Ball, and W. Ranieri, "Comparison of beck depression inventories -ia and -ii in psychiatric outpatients. Journal of Personality Assessment, 67(3):588{97, December 1996.Google ScholarGoogle ScholarCross RefCross Ref
  25. V. Mitra, H. Franco, M. Graciarena, "Damped Oscillator Cepstral Coefficients for Robust Speech Recognition," Proc. of Interspeech, pp. 886--890, 2013.Google ScholarGoogle Scholar
  26. V. Mitra, H. Franco, M. Graciarena, A. Mandal, "Normalized Amplitude Modulation Features for Large Vocabulary Noise-Robust Speech Recognition," Proc. of ICASSP, pp. 4117--4120, 2012.Google ScholarGoogle Scholar
  27. R. Drullman, J.M. Festen, R. Plomp, "Effect of Reducing Slow Temporal Modulations on Speech Reception," J. Acoust. Soc. of Am., Vol. 95, No. 5, pp. 2670--2680, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  28. V. Ghitza, "On the Upper Cutoff Frequency of Auditory Critical-Band Envelope Detectors in the Context of Speech Perception," J. Acoust. Soc. of America, vol. 110, no. 3, pp. 1628--1640, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  29. P. Maragos, J. Kaiser, T. Quatieri, "Energy Separation in Signal Modulations with Application to Speech Analysis," IEEE Trans. Signal Processing, Vol. 41, pp. 3024--3051, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. McLaren, N. Scheffer, M. Graciarena, L. Ferrer and Y. Lei, "Improving speaker identification robustness to highly channel-degraded speech through multiple system fusion", in proc. of ICASSP 2013.Google ScholarGoogle ScholarCross RefCross Ref
  31. A. Lawson, M. McLaren, Y. Lei, V. Mitra, N. Scheffer, L. Ferrer, M. Graciarena, "Improving Language Identification Robustness to Highly Channel-Degraded Speech Through Multiple System Fusion," in Proc. of Interspeech, pp. 1507--1510, Lyon, 2013.Google ScholarGoogle Scholar
  32. V. Mitra, M. McLaren, H. Franco, M. Graciarena, N. Scheffer, "Modulation Features for Noise Robust Speaker Identification," Proc. of Interspeech, pp. 3703--3707, 2013.Google ScholarGoogle Scholar
  33. V. Mitra, H. Franco, M. Graciarena, D. Vergyri, "Medium duration modulation cepstral feature for robust speech recognition," Proc. of ICASSP, pp. 1768--1772, Florence, 2014.Google ScholarGoogle Scholar
  34. H. Teager, "Some Observations on Oral Air Flow during Phonation," IEEE Trans. ASSP, pp. 599--601, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  35. V. Mitra, G. Sivaraman, H. Nam, C. Espy-Wilson, E. Saltzman, "Articulatory features from deep neural networks and their role in speech recognition," Proc. of ICASSP, pp.3041--3045, Florence, 2014.Google ScholarGoogle Scholar
  36. V. Mitra, H. Nam, C. Espy-Wilson, E. Saltzman, L. Goldstein, "Articulatory Information for Noise Robust Speech Recognition," IEEE Trans. on ASLP, Vol. 19, Iss. 7, pp. 1913--1924, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. H. Nam, L. Goldstein, E. Saltzman, D. Byrd, "TADA: An enhanced, Portable Task Dynamics Model in Matlab," J. of Acoust. Soc. Am., 115(5), p. 2430, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  38. E. Shriberg, A. Stolcke, S. Ravuri, "Addressee Detection for Dialog Systems Using Temporal and Spectral Dimensions of Speaking Style," Proc. of Interspeech, 2013.Google ScholarGoogle Scholar
  39. P. Boersma, D. Weenink, "Praat: doing phonetics by computer," Version 5.1.05, url: http://www.praat.org/, 2009Google ScholarGoogle Scholar
  40. N.C. Yoder, "Peak Finder," Matlab program, url: http://www.mathworks.com/matlabcentral/fileexchange/25500-peakfinder, 2011.Google ScholarGoogle Scholar
  41. P. Ghahremani, B. BabaAli, D. Povey, K. Riedhammer, J. Trmal and S. Khudanpu "A Pitch Extraction Algorithm Tuned for Automatic Speech Recognition," in Proc. of ICASSP, 2014.Google ScholarGoogle Scholar
  42. D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al., "The kaldi speech recognition toolkit," in Proc. ASRU, 2011.Google ScholarGoogle Scholar
  43. A. Juneja, "Speech recognition based on phonetic features and acoustic landmarks", PhD thesis, University of Maryland College Park, December 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. O. Deshmukh, J. Singh, C. Espy-Wilson. 2004. "A novel method for computation of periodicity, aperiodicity and pitch of speech signals," Proceedings of the 34th International Conference on Acoustics, Speech and Signal Processing, 17-21 May, Montreal, Canada, pp. 117--20.Google ScholarGoogle Scholar
  45. T. Pruthi, C. Espy-Wilson, "Acoustic parameters for the automatic detection of vowel nasalization," Proceedings of INTERSPEECH, pp. 1925--1928, 2007.Google ScholarGoogle Scholar
  46. N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, "Front-end factor analysis for speaker verification," IEEE Trans. on Speech and Audio Processing, 2011, 19, 788--798. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. D. Martınez, O. Plchot, L. Burget, O. Glembek, P. Matejka, "Language recognition in ivectors space." Proceedings of Interspeech, Italy, 861--864, 2011.Google ScholarGoogle Scholar
  48. McLaren M.; Scheffer N.; Ferrer L. & Lei, Y. "Effective use of DCTs for Contextualizing Features for Speaker Recognition," Proc. ICASSP, 2014.Google ScholarGoogle Scholar
  49. H. Drucker, C.J. Burges, L. Kaufman, A. Smola, V. Vapnik, "Support vector regression machines. Advances in neural information processing systems," 9, pp. 155--161, 1997Google ScholarGoogle Scholar
  50. M. H. Bahari, M. McLaren, H. van hamme, and D. A. van Leeuwen. "Age estimation from telephone speech using i-vectors," in Proc. of InterSpeech 2012, 2012.Google ScholarGoogle Scholar
  51. Pedregosa et al. "Scikit-learn: Machine Learning in Python," JMLR 12, pp. 2825--2830, 2011. url: http://scikit-learn.org Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. L. Ferrer, L. Burget, O. Plchot, and N. Scheffer, "A unified approach for audio characterization and its application to speaker recognition," in Proc. of the Speaker and Language Recognition Workshop, Odyssey 2010, Brno, Czech Republic, Jun. 2010.Google ScholarGoogle Scholar
  53. F. Eyben, M. Wöllmer, B. Schuller: "openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor", Proc. ACM Multimedia (MM), ACM, Florence, Italy, ISBN 978--1--60558--933--6, pp. 1459--1462, 25.-29.10.2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. K. Subrahmanyam, N. Shiva Sankar, S. Praveen Baggam, R. Rao S, "A Modified KS - test for Feature Selection," IOSR Journal of Computer Engineering, e-ISSN: 2278-0661, p-ISSN: 2278--8727, Vol. 13, Iss. 3, pp. 73--79, 2013.Google ScholarGoogle Scholar

Index Terms

  1. The SRI AVEC-2014 Evaluation System

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge
        November 2014
        110 pages
        ISBN:9781450331197
        DOI:10.1145/2661806

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 November 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        AVEC '14 Paper Acceptance Rate8of22submissions,36%Overall Acceptance Rate52of98submissions,53%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader