Combining Modalities: Multimodal SSI

Freitas, João; Teixeira, António; Dias, Miguel Sales; Silva, Samuel

doi:10.1007/978-3-319-40174-4_4

João Freitas^5,6,
António Teixeira⁷,
Miguel Sales Dias^6,8 &
…
Samuel Silva⁷

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

1769 Accesses
2 Altmetric

Abstract

In previous chapters, we have seen how various silent speech interface (SSI) modalities gather information concerning the different stages of speech production, covering brain and muscular activity, articulation, acoustics, and visual speech features. In this chapter, the reader is introduced to the combination of different modalities, not only to drive silent speech interfaces, but also to further enhance the understanding regarding emerging and promising modalities, e.g., ultrasonic Doppler. This approach poses several challenges dealing with the acquisition, synchronization, processing and analysis of the multimodal data. These challenges lead the authors to propose a framework to support research on multimodal silent speech interfaces (SSIs) and to provide concrete examples of its practical application, considering several of the SSI modalities covered in previous chapters. For each example, we propose baseline methods for comparison with the collected data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abreu H (2014) Visual speech recognition for European Portuguese, M.Sc. thesis. University of Minho, Portugal
Google Scholar
Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23:681–685. doi:10.1109/34.927467
Article Google Scholar
Cover TM, Thomas JA (2005) Elements of information theory. Wiley, New York. doi:10.1002/047174882X
Book MATH Google Scholar
Denby, B (2013. Down with Sound, the Story of Silent Speech. In: Workshop on Speech production in automatic speech recognition
Google Scholar
Denby B, Stone, M (2004) Speech synthesis from real time ultrasound images of the tongue. 2004 IEEE Int. Conf. Acoust. Speech, Signal Process. 1. doi:10.1109/ICASSP.2004.1326078
Dubois C, Otzenberger H, Gounot D, Sock R, Metz-Lutz M-N (2012) Visemic processing in audiovisual discrimination of natural speech: a simultaneous fMRI–EEG study. Neuropsychologia 50:1316–1326
Article Google Scholar
Ferreira A, Figueiredo M (2012) Efficient feature selection filters for high-dimensional data. Pattern Recognit Lett 33:1794–1804. doi:10.1016/j.patrec.2012.05.019
Article Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7:179–188. doi:10.1111/j.1469-1809.1936.tb02137.x
Google Scholar
Florescu VM, Crevier-Buchman L, Denby B, Hueber T, Colazo-Simon A, Pillot-Loiseau C, Roussel-Ragot P, Gendrot C, Quattrocchi S (2010) Silent vs vocalized articulation for a portable ultrasound-based silent speech interface. In: Proceedings of Interspeech 2010, pp 450–453
Google Scholar
Freitas J, Teixeira A, Dias MS (2012a) Towards a silent speech interface for Portuguese: surface electromyography and the nasality challenge. In: International conference on bio-inspired systems and signal processing (BIOSIGNALS 2012), pp 91–100
Google Scholar
Freitas J, Teixeira A, Vaz F, Dias MS (2012b) Automatic speech recognition based on ultrasonic doppler sensing for European Portuguese. In: Advances in speech and language technologies for Iberian languages, communications in computer and information science. Springer, Berlin, pp 227–236. doi:10.1007/978-3-642-35292-8_24
Freitas J, Ferreira A, Figueiredo M, Teixeira A, Dias MS (2014a) Enhancing multimodal silent speech interfaces with feature selection. In: 15th Annual Conf. of the Int. Speech Communication Association (Interspeech 2014), Singapore, pp. 1169–1173
Google Scholar
Freitas J, Teixeira A, Dias MS (2014b) Multimodal corpora for silent speech interaction. In: 9th Language resources and evaluation conference, pp 1–5
Google Scholar
Freitas J, Teixeira A, Silva S, Oliveira C, Dias MS (2014c) Assessing the applicability of surface EMG to tongue gesture detection. In: Proceedings of IberSPEECH 2014, lecture notes in artificial intelligence (LNAI). Springer, Berlin, pp 189–198
Google Scholar
Freitas J, Teixeira A, Silva S, Oliveira C, Dias MS (2014d) Velum movement detection based on surface electromyography for speech interface. In: International conference on bio-inspired systems and signal processing (BIOSIGNALS 2014), pp 13–20
Google Scholar
Freitas J, Teixeira A, Silva S, Oliveira C, Dias MS (2015) Detecting nasal vowels in speech interfaces based on surface electromyography. PLoS One 10, e0127040. doi:10.1371/journal.pone.0127040
Article Google Scholar
Galatas G, Potamianos G, Makedon F (2012a) Audio-visual speech recognition incorporating facial depth information captured by the Kinect. In: 20th European signal processing conference, pp 2714–2717
Google Scholar
Galatas G, Potamianos G, Makedon F (2012b) Audio-visual speech recognition using depth information from the kinect in noisy video condition. In: Proceedings of the 5th International conference on pervasive technologies related to assistive environments—PETRA’12, pp 1–4. doi:10.1145/2413097.2413100
Gurban M, Thiran J-P (2009) Information theoretic feature extraction for audio-visual speech recognition. IEEE Trans Signal Process 57:4765–4776. doi:10.1109/TSP.2009.2026513
Article MathSciNet Google Scholar
Hofe R, Bai J, Cheah LA, Ell SR, Gilbert JM, Moore RK, Green PD (2013) Performance of the MVOCA silent speech interface across multiple speakers. In: Proc. of Interspeech, 2013, pp. 1140–1143
Google Scholar
Holzrichter JF, Burnett GC, Ng LC, Lea WA (1998) Speech articulator measurements using low power EM-wave sensors. J Acoust Soc Am. doi:10.1121/1.421133
Google Scholar
Instruments, A (2014) Articulate assistant advanced ultrasound module user manual, Revision 212. Articulate Instruments, Edinburgh
Google Scholar
Kalgaonkar K, Hu RHR, Raj B (2007) Ultrasonic Doppler sensor for voice activity detection. IEEE Signal Proc Lett 14:754–757. doi:10.1109/LSP.2007.896450
Article Google Scholar
Lahr RJ (2006) Head-worn, trimodal device to increase transcription accuracy in a voice recognition system and to process unvocalized speech. US 7082393 B2
Google Scholar
Narayanan S, Bresch E, Ghoosh P, Goldstein L, Katsamanis A, Kim Y, Lammert AC, Proctor M, Ramanarayanan V, Zhu Y (2011) A multimodal real-time MRI articulatory corpus for speech research. In: Proc. Interspeech, 2011, pp. 837–840
Google Scholar
Oppenheim AV, Schafer RW, Buck JR (1999) Discrete time signal processing. Prentice-Hall, Upper Saddle River
Google Scholar
Plux Wireless Biosignals (n.d.) www.plux.info/. Accessed 30 October 2014
Potamianos G, Neti C, Gravier G, Garg A, Senior AW (2003) Recent advances in the automatic recognition of audiovisual speech. Proc IEEE 91:1306–1326
Article Google Scholar
Scobbie JM, Wrench AA, van der Linden M (2008) Head-probe stabilisation in ultrasound tongue imaging using a headset to permit natural head movement. In: Proceedings of the 8th International seminar on speech production, pp 373–376
Google Scholar
Silva S, Teixeira A (2014) Automatic annotation of an ultrasound corpus for studying tongue movement. In: Proc. ICIAR, LNCS 8814. Springer, Vilamoura, pp. 469–476
Google Scholar
Srinivasan S, Raj B, Ezzat T (2010) Ultrasonic sensing for robust speech recognition. In: IEEE int. conf. on acoustics, speech and signal processing (ICASSP 2010). doi:10.1109/ICASSP.2010.5495039
Stone M, Lundberg A (1996) Three-dimensional tongue surface shapes of English consonants and vowels. J Acoust Soc Am 99:3728–3737. doi:10.1121/1.414969
Tran V-A, Bailly G, Lœvenbruck H, Toda T (2009) Multimodal HMM-based NAM-to-speech conversion. Interspeech 2009:656–659
Google Scholar
Tran VA, Bailly G, Loevenbruck H, Toda T (2010) Improvement to a NAM-captured whisper-to-speech system. Speech Commun 52:314–326. doi:10.1016/j.specom.2009.11.005
Article Google Scholar
Wand M, Schultz T (2011) Session-independent EMG-based Speech Recognition. In: International conference on bio-inspired systems and signal processing (BIOSIGNALS 2011), pp 295–300
Google Scholar
Yau WC, Arjunan SP, Kumar DK (2008) Classification of voiceless speech using facial muscle activity and vision based techniques. TENCON 2008–2008 IEEE Reg. 10 Conf. doi:10.1109/TENCON.2008.4766822

Download references

Author information

Authors and Affiliations

DefinedCrowd Corporation, Lisboa, Portugal
João Freitas
Microsoft Language Development Center, Microsoft Portugal, Lisboa, Portugal
João Freitas & Miguel Sales Dias
Department of Electronics, Telecommunications and Informatics/IEETA, University of Aveiro, Aveiro, Portugal
António Teixeira & Samuel Silva
Instituto Universitário de Lisboa (ISCTE-IUL), ISTAR-IUL, Lisboa, Portugal
Miguel Sales Dias

Authors

João Freitas
View author publications
You can also search for this author in PubMed Google Scholar
António Teixeira
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Sales Dias
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Silva
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Freitas, J., Teixeira, A., Dias, M.S., Silva, S. (2017). Combining Modalities: Multimodal SSI. In: An Introduction to Silent Speech Interfaces. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-40174-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-40174-4_4
Published: 06 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40173-7
Online ISBN: 978-3-319-40174-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics