The Security of the Speech Interface: A Modelling Framework and Proposals for New Defence Mechanisms

Bispham, Mary K.; Agrafiotis, Ioannis; Goldsmith, Michael

doi:10.1007/978-3-030-49443-8_14

Mary K. Bispham⁹,
Ioannis Agrafiotis⁹ &
Michael Goldsmith⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1221))

Included in the following conference series:

International Conference on Information Systems Security and Privacy

624 Accesses

Abstract

This paper presents an attack and defence modelling framework for conceptualising the security of the speech interface. The modelling framework is based on the Observe-Orient-Decide-Act (OODA) loop model, which has been used to analyse adversarial interactions in a number of other areas. We map the different types of attacks that may be executed via the speech interface to the modelling framework, and present a critical analysis of the currently available defences for countering such attacks, with reference to the modelling framework. The paper then presents proposals for the development of new defence mechanisms that are grounded in the critical analysis of current defences. These proposals envisage a defence capability that would enable voice-controlled systems to detect potential attacks as part of their dialogue management functionality. In accordance with this high-level defence concept, the paper presents two specific proposals for defence mechanisms to be implemented as part of dialogue management functionality to counter attacks that exploit unintended functionality in speech recognition functionality and natural language understanding functionality. These defence mechanisms are based on the novel application of two existing technologies for security purposes. The specific proposals include the results of two feasibility tests that investigate the effectiveness of the proposed mechanisms in defending against the relevant type of attack.

Supported by a doctoral training grant from the Engineering and Physical Sciences Research Council (EPSRC).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A recent UK government survey, for example, reported that 8% of adults in the UK now own a smart speaker, see https://gds.blog.gov.uk/2018/08/23/hey-gov-uk-what-are-you-doing-about-voice/.
2.
See Wired, 27th December 2017, “Hackers can rickroll thousands of Sonos and Bose speakers over the internet”, https://www.wired.com/story/hackers-can-rickroll-sonos-bose-speakers-over-internet/ and Trend Micro report 2017, “The Sound of a Targeted Attack”, https://documents.trendmicro.com/assets/pdf/The-Sound-of-a-Targeted-Attack.pdf.
3.
See UPROXX, 12th January 2017 “You Can Make Amazon Echo and Google Home Talk to Each Other Forever”, http://uproxx.com/technology/amazon-echo-google-home-infinity-loop/ and cnet.com 15th February 2018, “Make Siri, Alexa and Google Assistant talk in an infinite loop”, https://www.cnet.com/how-to/make-siri-alexa-and-google-assistant-talk-in-an-infinite-loop/.
4.
See Cleverhans blog, 15th February 2017, “Is attacking machine learning easier than defending it?”, http://www.cleverhans.io/security/privacy/ml/2017/02/15/why-attacking-machine-learning-is-easier-than-defending-it.html.
5.
See Endgame blog 20th January 2017, ‘Endgame Announces Artemis: ‘Siri For Security’ To Transform SOC Operations’, https://www.endgame.com/news/press-releases/endgame-announces-artemis-siri-security-transform-soc-operations.
6.
See Medium blog, 13th February 2013, ‘Havyn: a cognitive assistant for cybersecurity’, https://medium.com/cognitivebusiness/havyn-a-cognitive-assistant-for-cybersecurity-e6580898f49e.
7.
The authors are grateful to the University of Oxford’s Faculty of Linguistics, Philology and Phonetics for providing access to the FlexSR system for the purposes of this work.
8.
See http://www.hiddenvoicecommands.com/black-box.
9.
See https://translate.google.co.uk/.

References

Agadakos, I., et al.: Jumping the air gap: modeling cyber-physical attack paths in the internet-of-things. In: Proceedings of the 2017 Workshop on Cyber-Physical Systems Security and Privacy, pp. 37–48 (2017)
Google Scholar
Al-Mohannadi, H., Mirza, Q., Namanya, A., Awan, I., Cullen, A., Disso, J.: Cyber-attack modeling analysis techniques: an overview. In: IEEE International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), pp. 69–76 (2016)
Google Scholar
Alepis, E., Patsakis, C.: Monkey says, monkey does: security and privacy on voice assistants. IEEE Access 5, 17841–17851 (2017)
Article Google Scholar
Arora, V., Lahiri, A., Reetz, H.: Phonological feature-based speech recognition system for pronunciation training in non-native language learning. J. Acoust. Soc. Am. 143(1), 98–108 (2018)
Article Google Scholar
Auger, J.: Speculative design: crafting the speculation. Dig. Creativity 24(1), 11–35 (2013)
Article Google Scholar
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Commun. Surv. Tutor. 16(1), 303–336 (2014)
Article Google Scholar
Bispham, M.K., Agrafiotis, I., Goldsmith, M.: Attack and defence modelling for attacks via the speech interface. In: Proceedings of International Conference on Information Systems Security and Privacy, ICISSP 2019, pp. 519–527 (2019)
Google Scholar
Bispham, M.K., Agrafiotis, I., Goldsmith, M.: Nonsense attacks on Google assistant and missense attacks on Amazon Alexa. In: Proceedings of International Conference on Information Systems Security and Privacy, ICISSP 2019, pp. 75–87 (2019)
Google Scholar
Bispham, M.K., Janse van Rensburg, A., Agrafiotis, I., Goldsmith, M.: Black-box attacks via the speech interface using linguistically crafted input. In: Mori, P., et al. (eds.) ICISSP 2019, CCIS 1221, pp. xx–yy, revised and extended paper. Springer, Cham (2020)
Google Scholar
Bispham, M.K., Agrafiotis, I., Goldsmith, M.: A taxonomy of attacks via the speech interface. In: Proceedings of CYBER 2018: The Third International Conference on Cyber-Technologies and Cyber-Systems, pp. 7–14 (2018)
Google Scholar
Boyd, J.R.: The essence of winning and losing. Unpublished Lecture Notes 12(23), 123–125 (1996)
Google Scholar
Brehmer, B.: The dynamic OODA loop: Amalgamating Boyd’s OODA loop and the dynamic decision loop (2005)
Google Scholar
Budanitsky, A., Hirst, G.: Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and Other Lexical Resources, vol. 2, p. 2 (2001)
Google Scholar
Carlini, N., et al.: Hidden voice commands. In: 25th USENIX Security Symposium (USENIX Security 2016), Austin, TX (2016)
Google Scholar
Carlini, N., Wagner, D.: Audio adversarial examples: Targeted attacks on speech-to-text. arXiv preprint arXiv:1801.01944 (2018)
Chung, H., Park, J., Lee, S.: Digital forensic approaches for Amazon Alexa ecosystem. Dig. Invest. 22, 15–25 (2017)
Article Google Scholar
Ciesielski, A., Yeh, B., Gordge, K., Basescu, M., Tunstel, E.: Vocal human-robot interaction inspired by Battle Management Language. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 3379–3384 (2017)
Google Scholar
Dhanjani, N.: Abusing the Internet of Things: Blackouts, Freakouts, and Stakeouts. O’Reilly Media Inc., Sebastopol (2015)
Google Scholar
Diao, W., Liu, X., Zhou, Z., Zhang, K.: Your voice assistant is mine: How to abuse speakers to steal information and control your phone. In: Proceedings of the 4th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices, pp. 63–74. ACM (2014)
Google Scholar
Fedorenko, E., et al.: Neural correlate of the construction of sentence meaning. Proc. Natl. Acad. Sci. 113(41), 6256–6262 (2016)
Article Google Scholar
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: DARPA TIMIT acoustic-phonetic continous speech corpus cd-rom. NASA STI/Recon technical report (1993)
Google Scholar
Giraldo, J., Sarkar, E., Cardenas, A.A., Maniatakos, M., Kantarcioglu, M.: Security and privacy in cyber-physical systems: a survey of surveys. IEEE Des. Test 34(4), 7–17 (2017)
Article Google Scholar
Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)
Google Scholar
Gong, Y., Poellabauer, C.: An overview of vulnerabilities of voice controlled systems. arXiv preprint arXiv:1803.09156 (2018)
Hansen, J.H., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)
Article Google Scholar
Hasan, M.R., Jamil, M., Rahman, M., et al.: Speaker identification using MEL frequency cepstral coefficients. Variations 1(4) (2004)
Google Scholar
Jackson, C., Orebaugh, A.: A study of security and privacy issues associated with the Amazon Echo. Int. J. Internet Things Cyber-Assur. 1(1), 91–100 (2018)
Article Google Scholar
Johnson, M.A., Goldberg, A.E.: Evidence for automatic accessing of constructional meaning: Jabberwocky sentences prime associated verbs. Lang. Cognit. Process. 28(10), 1439–1452 (2013)
Article Google Scholar
Kaljurand, K., Alumäe, T.: Controlled natural language in speech recognition based user interfaces. In: International Workshop on Controlled Natural Language, pp. 79–94 (2012)
Google Scholar
Khan, O.Z., Sarikaya, R.: Making personal digital assistants aware of what they do not know. In: INTERSPEECH, pp. 1161–1165 (2016)
Google Scholar
Klein, G., Tolle, J., Martini, P.: From detection to reaction-a holistic approach to cyber defense. In: Defense Science Research Conference and Expo (DSR) 2011, pp. 1–4. IEEE (2011)
Google Scholar
Kong, X., Choi, J.-Y., Shattuck-Hufnagel, S.: Evaluating automatic speech recognition systems in comparison with human perception results using distinctive feature measures. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5810–5814. IEEE (2017)
Google Scholar
Kuhn, T.: A survey and classification of controlled natural languages. Comput. Linguist. 40(1), 121–170 (2014)
Article Google Scholar
Lahiri, A., Reetz, H., Roberts, P.: Method and apparatus for automatic speech recognition. US Patent App. 15/105,552 (2016)
Google Scholar
Lahiri, A., Reetz, H.: Distinctive features: phonological underspecification in representation and processing. J. Phonet. 38(1), 44–59 (2010)
Article Google Scholar
Liang, B., Li, H., Su, M., Bian, P., Li, X., Shi, W.: Deep Text Classification Can be Fooled. arXiv preprint arXiv:1704.08006 (2017)
Lison, P., Meena, R.: Spoken dialogue systems: the new frontier in human-computer interaction. XRDS: Crossroads ACM Mag. Stud. 21(1), 46–51 (2014)
Article Google Scholar
Liu, W., Chen, F., Hu, H., Cheng, G., Huo, S., Liang, H.: A novel framework for zero-day attacks detection and response with cyberspace mimic defense architecture. In: Proceedings of 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 50–53 (2017)
Google Scholar
Loukas, G., Gan, D., Vuong, T.: A taxonomy of cyber attack and defence mechanisms for emergency management networks. In: 2013 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), pp. 534–539. IEEE (2013)
Google Scholar
McShane, M., Blissett, K., Nirenburg, I.: Treating unexpected input in incremental semantic analysis. In: Proceedings of The Fifth Annual Conference on Advances in Cognitive Systems, Cognitive Systems Foundation, Palo Alto, CA (2017)
Google Scholar
McTear, M., Callejas, Z., Griol, D.: The Conversational Interface. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32967-3
Book Google Scholar
Navigli, R., Ponzetto, S.P.: Joining forces pays off: multilingual joint word sense disambiguation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1399–1410 (2012)
Google Scholar
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against deep learning systems using adversarial examples. arXiv preprint arXiv:1602.02697 (2016)
Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: Military Communications Conference, MILCOM 2016–2016 IEEE, pp. 49–54 (2016)
Google Scholar
Patten, T., Call, C., Mitchell, D., Taylor, J., Lasser, S.: Defining the malice space with natural language processing techniques. In: Cybersecurity Symposium (CYBERSEC), pp. 44–50. IEEE (2016)
Google Scholar
Petracca, G., Sun, Y., Jaeger, T., Atamli, A.: Audroid: preventing attacks on audio channels in mobile devices. In: Proceedings of the 31st Annual Computer Security Applications Conference, pp. 181–190. ACM (2015)
Google Scholar
Pucher, M., Türk, A., Ajmera, J., Fecher, N.: Phonetic distance measures for speech recognition vocabulary and grammar optimization. In: 3rd Congress of the Alps Adria Acoustics Association, pp. 2–5 (2007)
Google Scholar
Rieck, K., Laskov, P.: Detecting unknown network attacks using language models. In: Büschkes, R., Laskov, P. (eds.) DIMVA 2006. LNCS, vol. 4064, pp. 74–90. Springer, Heidelberg (2006). https://doi.org/10.1007/11790754_5
Chapter Google Scholar
Roy, N., Shen, S., Hassanieh, H., Choudhury, R.R.: Inaudible voice commands: the long-range attack and defense. In: 15th USENIX Symposium on Networked Systems Design and Implementation NSDI 2018), pp. 547–560. USENIX Association (2018)
Google Scholar
Rule, J.N.: A Symbiotic Relationship: The OODA Loop, Intuition, and Strategic Thought. US Army War College (2013)
Google Scholar
Schneider, M.A., Wendland, M.-F., Hoffmann, A.: A negative input space complexity metric as selection criterion for fuzz testing. In: El-Fakih, K., Barlas, G., Yevtushenko, N. (eds.) ICTSS 2015. LNCS, vol. 9447, pp. 257–262. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25945-1_17
Chapter Google Scholar
Janse van Rensburg, A., Nurse, J.R., Goldsmith, M.: Attacker-parametrised attack graphs. In: 10th International Conference on Emerging Security Information, Systems and Technologies (2016)
Google Scholar
Weller-Fahy, D.J., Borghetti, B.J., Sodemann, A.A.: A survey of distance and similarity measures used within network intrusion anomaly detection. IEEE Commun. Surv. Tutor. 17(1), 70–91 (2015)
Article Google Scholar
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015)
Article Google Scholar
Young, P.J., Jin, J.H., Woo, S., Lee, D.H.: BadVoice: soundless voice-control replay attack on modern smartphones. In: 2016 Eighth International Conference on Ubiquitous and Future Networks (ICUFN), pp. 882–887. IEEE (2016)
Google Scholar
Young, S., Gašić, M., Thomson, B., Williams, J.D.: POMDP-based statistical spoken dialog systems: a review. Proc. IEEE 101(5), 1160–1179 (2013)
Article Google Scholar
Zhang, G., Yan, C., Ji, X., Zhang, T., Zhang, T., Xu, W.: DolphinAttack: inaudible voice commands. arXiv preprint arXiv:1708.09537 (2017)

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Oxford, Oxford, OX1 3QD, UK
Mary K. Bispham, Ioannis Agrafiotis & Michael Goldsmith

Authors

Mary K. Bispham
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Agrafiotis
View author publications
You can also search for this author in PubMed Google Scholar
Michael Goldsmith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mary K. Bispham .

Editor information

Editors and Affiliations

IIT-CNR, Pisa, Italy
Paolo Mori
Plymouth University, Plymouth, UK
Steven Furnell
MODESTE/ESEO, Angers Cedex 2, France
Olivier Camp

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bispham, M.K., Agrafiotis, I., Goldsmith, M. (2020). The Security of the Speech Interface: A Modelling Framework and Proposals for New Defence Mechanisms. In: Mori, P., Furnell, S., Camp, O. (eds) Information Systems Security and Privacy. ICISSP 2019. Communications in Computer and Information Science, vol 1221. Springer, Cham. https://doi.org/10.1007/978-3-030-49443-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-49443-8_14
Published: 28 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49442-1
Online ISBN: 978-3-030-49443-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Security of the Speech Interface: A Modelling Framework and Proposals for New Defence Mechanisms