Skip to main content

Combining Modalities: Multimodal SSI

  • Chapter
  • First Online:
An Introduction to Silent Speech Interfaces

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

Abstract

In previous chapters, we have seen how various silent speech interface (SSI) modalities gather information concerning the different stages of speech production, covering brain and muscular activity, articulation, acoustics, and visual speech features. In this chapter, the reader is introduced to the combination of different modalities, not only to drive silent speech interfaces, but also to further enhance the understanding regarding emerging and promising modalities, e.g., ultrasonic Doppler. This approach poses several challenges dealing with the acquisition, synchronization, processing and analysis of the multimodal data. These challenges lead the authors to propose a framework to support research on multimodal silent speech interfaces (SSIs) and to provide concrete examples of its practical application, considering several of the SSI modalities covered in previous chapters. For each example, we propose baseline methods for comparison with the collected data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abreu H (2014) Visual speech recognition for European Portuguese, M.Sc. thesis. University of Minho, Portugal

    Google Scholar 

  • Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23:681–685. doi:10.1109/34.927467

    Article  Google Scholar 

  • Cover TM, Thomas JA (2005) Elements of information theory. Wiley, New York. doi:10.1002/047174882X

    Book  MATH  Google Scholar 

  • Denby, B (2013. Down with Sound, the Story of Silent Speech. In: Workshop on Speech production in automatic speech recognition

    Google Scholar 

  • Denby B, Stone, M (2004) Speech synthesis from real time ultrasound images of the tongue. 2004 IEEE Int. Conf. Acoust. Speech, Signal Process. 1. doi:10.1109/ICASSP.2004.1326078

  • Dubois C, Otzenberger H, Gounot D, Sock R, Metz-Lutz M-N (2012) Visemic processing in audiovisual discrimination of natural speech: a simultaneous fMRI–EEG study. Neuropsychologia 50:1316–1326

    Article  Google Scholar 

  • Ferreira A, Figueiredo M (2012) Efficient feature selection filters for high-dimensional data. Pattern Recognit Lett 33:1794–1804. doi:10.1016/j.patrec.2012.05.019

    Article  Google Scholar 

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7:179–188. doi:10.1111/j.1469-1809.1936.tb02137.x

    Google Scholar 

  • Florescu VM, Crevier-Buchman L, Denby B, Hueber T, Colazo-Simon A, Pillot-Loiseau C, Roussel-Ragot P, Gendrot C, Quattrocchi S (2010) Silent vs vocalized articulation for a portable ultrasound-based silent speech interface. In: Proceedings of Interspeech 2010, pp 450–453

    Google Scholar 

  • Freitas J, Teixeira A, Dias MS (2012a) Towards a silent speech interface for Portuguese: surface electromyography and the nasality challenge. In: International conference on bio-inspired systems and signal processing (BIOSIGNALS 2012), pp 91–100

    Google Scholar 

  • Freitas J, Teixeira A, Vaz F, Dias MS (2012b) Automatic speech recognition based on ultrasonic doppler sensing for European Portuguese. In: Advances in speech and language technologies for Iberian languages, communications in computer and information science. Springer, Berlin, pp 227–236. doi:10.1007/978-3-642-35292-8_24

  • Freitas J, Ferreira A, Figueiredo M, Teixeira A, Dias MS (2014a) Enhancing multimodal silent speech interfaces with feature selection. In: 15th Annual Conf. of the Int. Speech Communication Association (Interspeech 2014), Singapore, pp. 1169–1173

    Google Scholar 

  • Freitas J, Teixeira A, Dias MS (2014b) Multimodal corpora for silent speech interaction. In: 9th Language resources and evaluation conference, pp 1–5

    Google Scholar 

  • Freitas J, Teixeira A, Silva S, Oliveira C, Dias MS (2014c) Assessing the applicability of surface EMG to tongue gesture detection. In: Proceedings of IberSPEECH 2014, lecture notes in artificial intelligence (LNAI). Springer, Berlin, pp 189–198

    Google Scholar 

  • Freitas J, Teixeira A, Silva S, Oliveira C, Dias MS (2014d) Velum movement detection based on surface electromyography for speech interface. In: International conference on bio-inspired systems and signal processing (BIOSIGNALS 2014), pp 13–20

    Google Scholar 

  • Freitas J, Teixeira A, Silva S, Oliveira C, Dias MS (2015) Detecting nasal vowels in speech interfaces based on surface electromyography. PLoS One 10, e0127040. doi:10.1371/journal.pone.0127040

    Article  Google Scholar 

  • Galatas G, Potamianos G, Makedon F (2012a) Audio-visual speech recognition incorporating facial depth information captured by the Kinect. In: 20th European signal processing conference, pp 2714–2717

    Google Scholar 

  • Galatas G, Potamianos G, Makedon F (2012b) Audio-visual speech recognition using depth information from the kinect in noisy video condition. In: Proceedings of the 5th International conference on pervasive technologies related to assistive environments—PETRA’12, pp 1–4. doi:10.1145/2413097.2413100

  • Gurban M, Thiran J-P (2009) Information theoretic feature extraction for audio-visual speech recognition. IEEE Trans Signal Process 57:4765–4776. doi:10.1109/TSP.2009.2026513

    Article  MathSciNet  Google Scholar 

  • Hofe R, Bai J, Cheah LA, Ell SR, Gilbert JM, Moore RK, Green PD (2013) Performance of the MVOCA silent speech interface across multiple speakers. In: Proc. of Interspeech, 2013, pp. 1140–1143

    Google Scholar 

  • Holzrichter JF, Burnett GC, Ng LC, Lea WA (1998) Speech articulator measurements using low power EM-wave sensors. J Acoust Soc Am. doi:10.1121/1.421133

    Google Scholar 

  • Instruments, A (2014) Articulate assistant advanced ultrasound module user manual, Revision 212. Articulate Instruments, Edinburgh

    Google Scholar 

  • Kalgaonkar K, Hu RHR, Raj B (2007) Ultrasonic Doppler sensor for voice activity detection. IEEE Signal Proc Lett 14:754–757. doi:10.1109/LSP.2007.896450

    Article  Google Scholar 

  • Lahr RJ (2006) Head-worn, trimodal device to increase transcription accuracy in a voice recognition system and to process unvocalized speech. US 7082393 B2

    Google Scholar 

  • Narayanan S, Bresch E, Ghoosh P, Goldstein L, Katsamanis A, Kim Y, Lammert AC, Proctor M, Ramanarayanan V, Zhu Y (2011) A multimodal real-time MRI articulatory corpus for speech research. In: Proc. Interspeech, 2011, pp. 837–840

    Google Scholar 

  • Oppenheim AV, Schafer RW, Buck JR (1999) Discrete time signal processing. Prentice-Hall, Upper Saddle River

    Google Scholar 

  • Plux Wireless Biosignals (n.d.) www.plux.info/. Accessed 30 October 2014

  • Potamianos G, Neti C, Gravier G, Garg A, Senior AW (2003) Recent advances in the automatic recognition of audiovisual speech. Proc IEEE 91:1306–1326

    Article  Google Scholar 

  • Scobbie JM, Wrench AA, van der Linden M (2008) Head-probe stabilisation in ultrasound tongue imaging using a headset to permit natural head movement. In: Proceedings of the 8th International seminar on speech production, pp 373–376

    Google Scholar 

  • Silva S, Teixeira A (2014) Automatic annotation of an ultrasound corpus for studying tongue movement. In: Proc. ICIAR, LNCS 8814. Springer, Vilamoura, pp. 469–476

    Google Scholar 

  • Srinivasan S, Raj B, Ezzat T (2010) Ultrasonic sensing for robust speech recognition. In: IEEE int. conf. on acoustics, speech and signal processing (ICASSP 2010). doi:10.1109/ICASSP.2010.5495039

  • Stone M, Lundberg A (1996) Three-dimensional tongue surface shapes of English consonants and vowels. J Acoust Soc Am 99:3728–3737. doi:10.1121/1.414969

  • Tran V-A, Bailly G, Lœvenbruck H, Toda T (2009) Multimodal HMM-based NAM-to-speech conversion. Interspeech 2009:656–659

    Google Scholar 

  • Tran VA, Bailly G, Loevenbruck H, Toda T (2010) Improvement to a NAM-captured whisper-to-speech system. Speech Commun 52:314–326. doi:10.1016/j.specom.2009.11.005

    Article  Google Scholar 

  • Wand M, Schultz T (2011) Session-independent EMG-based Speech Recognition. In: International conference on bio-inspired systems and signal processing (BIOSIGNALS 2011), pp 295–300

    Google Scholar 

  • Yau WC, Arjunan SP, Kumar DK (2008) Classification of voiceless speech using facial muscle activity and vision based techniques. TENCON 2008–2008 IEEE Reg. 10 Conf. doi:10.1109/TENCON.2008.4766822

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 The Author(s)

About this chapter

Cite this chapter

Freitas, J., Teixeira, A., Dias, M.S., Silva, S. (2017). Combining Modalities: Multimodal SSI. In: An Introduction to Silent Speech Interfaces. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-40174-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40174-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40173-7

  • Online ISBN: 978-3-319-40174-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics