Abstract
This paper presents an attack and defence modelling framework for conceptualising the security of the speech interface. The modelling framework is based on the Observe-Orient-Decide-Act (OODA) loop model, which has been used to analyse adversarial interactions in a number of other areas. We map the different types of attacks that may be executed via the speech interface to the modelling framework, and present a critical analysis of the currently available defences for countering such attacks, with reference to the modelling framework. The paper then presents proposals for the development of new defence mechanisms that are grounded in the critical analysis of current defences. These proposals envisage a defence capability that would enable voice-controlled systems to detect potential attacks as part of their dialogue management functionality. In accordance with this high-level defence concept, the paper presents two specific proposals for defence mechanisms to be implemented as part of dialogue management functionality to counter attacks that exploit unintended functionality in speech recognition functionality and natural language understanding functionality. These defence mechanisms are based on the novel application of two existing technologies for security purposes. The specific proposals include the results of two feasibility tests that investigate the effectiveness of the proposed mechanisms in defending against the relevant type of attack.
Supported by a doctoral training grant from the Engineering and Physical Sciences Research Council (EPSRC).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A recent UK government survey, for example, reported that 8% of adults in the UK now own a smart speaker, see https://gds.blog.gov.uk/2018/08/23/hey-gov-uk-what-are-you-doing-about-voice/.
- 2.
See Wired, 27th December 2017, “Hackers can rickroll thousands of Sonos and Bose speakers over the internet”, https://www.wired.com/story/hackers-can-rickroll-sonos-bose-speakers-over-internet/ and Trend Micro report 2017, “The Sound of a Targeted Attack”, https://documents.trendmicro.com/assets/pdf/The-Sound-of-a-Targeted-Attack.pdf.
- 3.
See UPROXX, 12th January 2017 “You Can Make Amazon Echo and Google Home Talk to Each Other Forever”, http://uproxx.com/technology/amazon-echo-google-home-infinity-loop/ and cnet.com 15th February 2018, “Make Siri, Alexa and Google Assistant talk in an infinite loop”, https://www.cnet.com/how-to/make-siri-alexa-and-google-assistant-talk-in-an-infinite-loop/.
- 4.
See Cleverhans blog, 15th February 2017, “Is attacking machine learning easier than defending it?”, http://www.cleverhans.io/security/privacy/ml/2017/02/15/why-attacking-machine-learning-is-easier-than-defending-it.html.
- 5.
See Endgame blog 20th January 2017, ‘Endgame Announces Artemis: ‘Siri For Security’ To Transform SOC Operations’, https://www.endgame.com/news/press-releases/endgame-announces-artemis-siri-security-transform-soc-operations.
- 6.
See Medium blog, 13th February 2013, ‘Havyn: a cognitive assistant for cybersecurity’, https://medium.com/cognitivebusiness/havyn-a-cognitive-assistant-for-cybersecurity-e6580898f49e.
- 7.
The authors are grateful to the University of Oxford’s Faculty of Linguistics, Philology and Phonetics for providing access to the FlexSR system for the purposes of this work.
- 8.
- 9.
References
Agadakos, I., et al.: Jumping the air gap: modeling cyber-physical attack paths in the internet-of-things. In: Proceedings of the 2017 Workshop on Cyber-Physical Systems Security and Privacy, pp. 37–48 (2017)
Al-Mohannadi, H., Mirza, Q., Namanya, A., Awan, I., Cullen, A., Disso, J.: Cyber-attack modeling analysis techniques: an overview. In: IEEE International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), pp. 69–76 (2016)
Alepis, E., Patsakis, C.: Monkey says, monkey does: security and privacy on voice assistants. IEEE Access 5, 17841–17851 (2017)
Arora, V., Lahiri, A., Reetz, H.: Phonological feature-based speech recognition system for pronunciation training in non-native language learning. J. Acoust. Soc. Am. 143(1), 98–108 (2018)
Auger, J.: Speculative design: crafting the speculation. Dig. Creativity 24(1), 11–35 (2013)
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Commun. Surv. Tutor. 16(1), 303–336 (2014)
Bispham, M.K., Agrafiotis, I., Goldsmith, M.: Attack and defence modelling for attacks via the speech interface. In: Proceedings of International Conference on Information Systems Security and Privacy, ICISSP 2019, pp. 519–527 (2019)
Bispham, M.K., Agrafiotis, I., Goldsmith, M.: Nonsense attacks on Google assistant and missense attacks on Amazon Alexa. In: Proceedings of International Conference on Information Systems Security and Privacy, ICISSP 2019, pp. 75–87 (2019)
Bispham, M.K., Janse van Rensburg, A., Agrafiotis, I., Goldsmith, M.: Black-box attacks via the speech interface using linguistically crafted input. In: Mori, P., et al. (eds.) ICISSP 2019, CCIS 1221, pp. xx–yy, revised and extended paper. Springer, Cham (2020)
Bispham, M.K., Agrafiotis, I., Goldsmith, M.: A taxonomy of attacks via the speech interface. In: Proceedings of CYBER 2018: The Third International Conference on Cyber-Technologies and Cyber-Systems, pp. 7–14 (2018)
Boyd, J.R.: The essence of winning and losing. Unpublished Lecture Notes 12(23), 123–125 (1996)
Brehmer, B.: The dynamic OODA loop: Amalgamating Boyd’s OODA loop and the dynamic decision loop (2005)
Budanitsky, A., Hirst, G.: Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and Other Lexical Resources, vol. 2, p. 2 (2001)
Carlini, N., et al.: Hidden voice commands. In: 25th USENIX Security Symposium (USENIX Security 2016), Austin, TX (2016)
Carlini, N., Wagner, D.: Audio adversarial examples: Targeted attacks on speech-to-text. arXiv preprint arXiv:1801.01944 (2018)
Chung, H., Park, J., Lee, S.: Digital forensic approaches for Amazon Alexa ecosystem. Dig. Invest. 22, 15–25 (2017)
Ciesielski, A., Yeh, B., Gordge, K., Basescu, M., Tunstel, E.: Vocal human-robot interaction inspired by Battle Management Language. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 3379–3384 (2017)
Dhanjani, N.: Abusing the Internet of Things: Blackouts, Freakouts, and Stakeouts. O’Reilly Media Inc., Sebastopol (2015)
Diao, W., Liu, X., Zhou, Z., Zhang, K.: Your voice assistant is mine: How to abuse speakers to steal information and control your phone. In: Proceedings of the 4th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices, pp. 63–74. ACM (2014)
Fedorenko, E., et al.: Neural correlate of the construction of sentence meaning. Proc. Natl. Acad. Sci. 113(41), 6256–6262 (2016)
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: DARPA TIMIT acoustic-phonetic continous speech corpus cd-rom. NASA STI/Recon technical report (1993)
Giraldo, J., Sarkar, E., Cardenas, A.A., Maniatakos, M., Kantarcioglu, M.: Security and privacy in cyber-physical systems: a survey of surveys. IEEE Des. Test 34(4), 7–17 (2017)
Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)
Gong, Y., Poellabauer, C.: An overview of vulnerabilities of voice controlled systems. arXiv preprint arXiv:1803.09156 (2018)
Hansen, J.H., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)
Hasan, M.R., Jamil, M., Rahman, M., et al.: Speaker identification using MEL frequency cepstral coefficients. Variations 1(4) (2004)
Jackson, C., Orebaugh, A.: A study of security and privacy issues associated with the Amazon Echo. Int. J. Internet Things Cyber-Assur. 1(1), 91–100 (2018)
Johnson, M.A., Goldberg, A.E.: Evidence for automatic accessing of constructional meaning: Jabberwocky sentences prime associated verbs. Lang. Cognit. Process. 28(10), 1439–1452 (2013)
Kaljurand, K., Alumäe, T.: Controlled natural language in speech recognition based user interfaces. In: International Workshop on Controlled Natural Language, pp. 79–94 (2012)
Khan, O.Z., Sarikaya, R.: Making personal digital assistants aware of what they do not know. In: INTERSPEECH, pp. 1161–1165 (2016)
Klein, G., Tolle, J., Martini, P.: From detection to reaction-a holistic approach to cyber defense. In: Defense Science Research Conference and Expo (DSR) 2011, pp. 1–4. IEEE (2011)
Kong, X., Choi, J.-Y., Shattuck-Hufnagel, S.: Evaluating automatic speech recognition systems in comparison with human perception results using distinctive feature measures. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5810–5814. IEEE (2017)
Kuhn, T.: A survey and classification of controlled natural languages. Comput. Linguist. 40(1), 121–170 (2014)
Lahiri, A., Reetz, H., Roberts, P.: Method and apparatus for automatic speech recognition. US Patent App. 15/105,552 (2016)
Lahiri, A., Reetz, H.: Distinctive features: phonological underspecification in representation and processing. J. Phonet. 38(1), 44–59 (2010)
Liang, B., Li, H., Su, M., Bian, P., Li, X., Shi, W.: Deep Text Classification Can be Fooled. arXiv preprint arXiv:1704.08006 (2017)
Lison, P., Meena, R.: Spoken dialogue systems: the new frontier in human-computer interaction. XRDS: Crossroads ACM Mag. Stud. 21(1), 46–51 (2014)
Liu, W., Chen, F., Hu, H., Cheng, G., Huo, S., Liang, H.: A novel framework for zero-day attacks detection and response with cyberspace mimic defense architecture. In: Proceedings of 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 50–53 (2017)
Loukas, G., Gan, D., Vuong, T.: A taxonomy of cyber attack and defence mechanisms for emergency management networks. In: 2013 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), pp. 534–539. IEEE (2013)
McShane, M., Blissett, K., Nirenburg, I.: Treating unexpected input in incremental semantic analysis. In: Proceedings of The Fifth Annual Conference on Advances in Cognitive Systems, Cognitive Systems Foundation, Palo Alto, CA (2017)
McTear, M., Callejas, Z., Griol, D.: The Conversational Interface. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32967-3
Navigli, R., Ponzetto, S.P.: Joining forces pays off: multilingual joint word sense disambiguation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1399–1410 (2012)
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against deep learning systems using adversarial examples. arXiv preprint arXiv:1602.02697 (2016)
Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: Military Communications Conference, MILCOM 2016–2016 IEEE, pp. 49–54 (2016)
Patten, T., Call, C., Mitchell, D., Taylor, J., Lasser, S.: Defining the malice space with natural language processing techniques. In: Cybersecurity Symposium (CYBERSEC), pp. 44–50. IEEE (2016)
Petracca, G., Sun, Y., Jaeger, T., Atamli, A.: Audroid: preventing attacks on audio channels in mobile devices. In: Proceedings of the 31st Annual Computer Security Applications Conference, pp. 181–190. ACM (2015)
Pucher, M., Türk, A., Ajmera, J., Fecher, N.: Phonetic distance measures for speech recognition vocabulary and grammar optimization. In: 3rd Congress of the Alps Adria Acoustics Association, pp. 2–5 (2007)
Rieck, K., Laskov, P.: Detecting unknown network attacks using language models. In: Büschkes, R., Laskov, P. (eds.) DIMVA 2006. LNCS, vol. 4064, pp. 74–90. Springer, Heidelberg (2006). https://doi.org/10.1007/11790754_5
Roy, N., Shen, S., Hassanieh, H., Choudhury, R.R.: Inaudible voice commands: the long-range attack and defense. In: 15th USENIX Symposium on Networked Systems Design and Implementation NSDI 2018), pp. 547–560. USENIX Association (2018)
Rule, J.N.: A Symbiotic Relationship: The OODA Loop, Intuition, and Strategic Thought. US Army War College (2013)
Schneider, M.A., Wendland, M.-F., Hoffmann, A.: A negative input space complexity metric as selection criterion for fuzz testing. In: El-Fakih, K., Barlas, G., Yevtushenko, N. (eds.) ICTSS 2015. LNCS, vol. 9447, pp. 257–262. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25945-1_17
Janse van Rensburg, A., Nurse, J.R., Goldsmith, M.: Attacker-parametrised attack graphs. In: 10th International Conference on Emerging Security Information, Systems and Technologies (2016)
Weller-Fahy, D.J., Borghetti, B.J., Sodemann, A.A.: A survey of distance and similarity measures used within network intrusion anomaly detection. IEEE Commun. Surv. Tutor. 17(1), 70–91 (2015)
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015)
Young, P.J., Jin, J.H., Woo, S., Lee, D.H.: BadVoice: soundless voice-control replay attack on modern smartphones. In: 2016 Eighth International Conference on Ubiquitous and Future Networks (ICUFN), pp. 882–887. IEEE (2016)
Young, S., Gašić, M., Thomson, B., Williams, J.D.: POMDP-based statistical spoken dialog systems: a review. Proc. IEEE 101(5), 1160–1179 (2013)
Zhang, G., Yan, C., Ji, X., Zhang, T., Zhang, T., Xu, W.: DolphinAttack: inaudible voice commands. arXiv preprint arXiv:1708.09537 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Bispham, M.K., Agrafiotis, I., Goldsmith, M. (2020). The Security of the Speech Interface: A Modelling Framework and Proposals for New Defence Mechanisms. In: Mori, P., Furnell, S., Camp, O. (eds) Information Systems Security and Privacy. ICISSP 2019. Communications in Computer and Information Science, vol 1221. Springer, Cham. https://doi.org/10.1007/978-3-030-49443-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-49443-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49442-1
Online ISBN: 978-3-030-49443-8
eBook Packages: Computer ScienceComputer Science (R0)