Skip to main content

The Security of the Speech Interface: A Modelling Framework and Proposals for New Defence Mechanisms

  • Conference paper
  • First Online:
Information Systems Security and Privacy (ICISSP 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1221))

Included in the following conference series:

  • 624 Accesses

Abstract

This paper presents an attack and defence modelling framework for conceptualising the security of the speech interface. The modelling framework is based on the Observe-Orient-Decide-Act (OODA) loop model, which has been used to analyse adversarial interactions in a number of other areas. We map the different types of attacks that may be executed via the speech interface to the modelling framework, and present a critical analysis of the currently available defences for countering such attacks, with reference to the modelling framework. The paper then presents proposals for the development of new defence mechanisms that are grounded in the critical analysis of current defences. These proposals envisage a defence capability that would enable voice-controlled systems to detect potential attacks as part of their dialogue management functionality. In accordance with this high-level defence concept, the paper presents two specific proposals for defence mechanisms to be implemented as part of dialogue management functionality to counter attacks that exploit unintended functionality in speech recognition functionality and natural language understanding functionality. These defence mechanisms are based on the novel application of two existing technologies for security purposes. The specific proposals include the results of two feasibility tests that investigate the effectiveness of the proposed mechanisms in defending against the relevant type of attack.

Supported by a doctoral training grant from the Engineering and Physical Sciences Research Council (EPSRC).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A recent UK government survey, for example, reported that 8% of adults in the UK now own a smart speaker, see https://gds.blog.gov.uk/2018/08/23/hey-gov-uk-what-are-you-doing-about-voice/.

  2. 2.

    See Wired, 27th December 2017, “Hackers can rickroll thousands of Sonos and Bose speakers over the internet”, https://www.wired.com/story/hackers-can-rickroll-sonos-bose-speakers-over-internet/ and Trend Micro report 2017, “The Sound of a Targeted Attack”, https://documents.trendmicro.com/assets/pdf/The-Sound-of-a-Targeted-Attack.pdf.

  3. 3.

    See UPROXX, 12th January 2017 “You Can Make Amazon Echo and Google Home Talk to Each Other Forever”, http://uproxx.com/technology/amazon-echo-google-home-infinity-loop/ and cnet.com 15th February 2018, “Make Siri, Alexa and Google Assistant talk in an infinite loop”, https://www.cnet.com/how-to/make-siri-alexa-and-google-assistant-talk-in-an-infinite-loop/.

  4. 4.

    See Cleverhans blog, 15th February 2017, “Is attacking machine learning easier than defending it?”, http://www.cleverhans.io/security/privacy/ml/2017/02/15/why-attacking-machine-learning-is-easier-than-defending-it.html.

  5. 5.

    See Endgame blog 20th January 2017, ‘Endgame Announces Artemis: ‘Siri For Security’ To Transform SOC Operations’, https://www.endgame.com/news/press-releases/endgame-announces-artemis-siri-security-transform-soc-operations.

  6. 6.

    See Medium blog, 13th February 2013, ‘Havyn: a cognitive assistant for cybersecurity’, https://medium.com/cognitivebusiness/havyn-a-cognitive-assistant-for-cybersecurity-e6580898f49e.

  7. 7.

    The authors are grateful to the University of Oxford’s Faculty of Linguistics, Philology and Phonetics for providing access to the FlexSR system for the purposes of this work.

  8. 8.

    See http://www.hiddenvoicecommands.com/black-box.

  9. 9.

    See https://translate.google.co.uk/.

References

  1. Agadakos, I., et al.: Jumping the air gap: modeling cyber-physical attack paths in the internet-of-things. In: Proceedings of the 2017 Workshop on Cyber-Physical Systems Security and Privacy, pp. 37–48 (2017)

    Google Scholar 

  2. Al-Mohannadi, H., Mirza, Q., Namanya, A., Awan, I., Cullen, A., Disso, J.: Cyber-attack modeling analysis techniques: an overview. In: IEEE International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), pp. 69–76 (2016)

    Google Scholar 

  3. Alepis, E., Patsakis, C.: Monkey says, monkey does: security and privacy on voice assistants. IEEE Access 5, 17841–17851 (2017)

    Article  Google Scholar 

  4. Arora, V., Lahiri, A., Reetz, H.: Phonological feature-based speech recognition system for pronunciation training in non-native language learning. J. Acoust. Soc. Am. 143(1), 98–108 (2018)

    Article  Google Scholar 

  5. Auger, J.: Speculative design: crafting the speculation. Dig. Creativity 24(1), 11–35 (2013)

    Article  Google Scholar 

  6. Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Commun. Surv. Tutor. 16(1), 303–336 (2014)

    Article  Google Scholar 

  7. Bispham, M.K., Agrafiotis, I., Goldsmith, M.: Attack and defence modelling for attacks via the speech interface. In: Proceedings of International Conference on Information Systems Security and Privacy, ICISSP 2019, pp. 519–527 (2019)

    Google Scholar 

  8. Bispham, M.K., Agrafiotis, I., Goldsmith, M.: Nonsense attacks on Google assistant and missense attacks on Amazon Alexa. In: Proceedings of International Conference on Information Systems Security and Privacy, ICISSP 2019, pp. 75–87 (2019)

    Google Scholar 

  9. Bispham, M.K., Janse van Rensburg, A., Agrafiotis, I., Goldsmith, M.: Black-box attacks via the speech interface using linguistically crafted input. In: Mori, P., et al. (eds.) ICISSP 2019, CCIS 1221, pp. xx–yy, revised and extended paper. Springer, Cham (2020)

    Google Scholar 

  10. Bispham, M.K., Agrafiotis, I., Goldsmith, M.: A taxonomy of attacks via the speech interface. In: Proceedings of CYBER 2018: The Third International Conference on Cyber-Technologies and Cyber-Systems, pp. 7–14 (2018)

    Google Scholar 

  11. Boyd, J.R.: The essence of winning and losing. Unpublished Lecture Notes 12(23), 123–125 (1996)

    Google Scholar 

  12. Brehmer, B.: The dynamic OODA loop: Amalgamating Boyd’s OODA loop and the dynamic decision loop (2005)

    Google Scholar 

  13. Budanitsky, A., Hirst, G.: Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and Other Lexical Resources, vol. 2, p. 2 (2001)

    Google Scholar 

  14. Carlini, N., et al.: Hidden voice commands. In: 25th USENIX Security Symposium (USENIX Security 2016), Austin, TX (2016)

    Google Scholar 

  15. Carlini, N., Wagner, D.: Audio adversarial examples: Targeted attacks on speech-to-text. arXiv preprint arXiv:1801.01944 (2018)

  16. Chung, H., Park, J., Lee, S.: Digital forensic approaches for Amazon Alexa ecosystem. Dig. Invest. 22, 15–25 (2017)

    Article  Google Scholar 

  17. Ciesielski, A., Yeh, B., Gordge, K., Basescu, M., Tunstel, E.: Vocal human-robot interaction inspired by Battle Management Language. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 3379–3384 (2017)

    Google Scholar 

  18. Dhanjani, N.: Abusing the Internet of Things: Blackouts, Freakouts, and Stakeouts. O’Reilly Media Inc., Sebastopol (2015)

    Google Scholar 

  19. Diao, W., Liu, X., Zhou, Z., Zhang, K.: Your voice assistant is mine: How to abuse speakers to steal information and control your phone. In: Proceedings of the 4th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices, pp. 63–74. ACM (2014)

    Google Scholar 

  20. Fedorenko, E., et al.: Neural correlate of the construction of sentence meaning. Proc. Natl. Acad. Sci. 113(41), 6256–6262 (2016)

    Article  Google Scholar 

  21. Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: DARPA TIMIT acoustic-phonetic continous speech corpus cd-rom. NASA STI/Recon technical report (1993)

    Google Scholar 

  22. Giraldo, J., Sarkar, E., Cardenas, A.A., Maniatakos, M., Kantarcioglu, M.: Security and privacy in cyber-physical systems: a survey of surveys. IEEE Des. Test 34(4), 7–17 (2017)

    Article  Google Scholar 

  23. Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)

    Google Scholar 

  24. Gong, Y., Poellabauer, C.: An overview of vulnerabilities of voice controlled systems. arXiv preprint arXiv:1803.09156 (2018)

  25. Hansen, J.H., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)

    Article  Google Scholar 

  26. Hasan, M.R., Jamil, M., Rahman, M., et al.: Speaker identification using MEL frequency cepstral coefficients. Variations 1(4) (2004)

    Google Scholar 

  27. Jackson, C., Orebaugh, A.: A study of security and privacy issues associated with the Amazon Echo. Int. J. Internet Things Cyber-Assur. 1(1), 91–100 (2018)

    Article  Google Scholar 

  28. Johnson, M.A., Goldberg, A.E.: Evidence for automatic accessing of constructional meaning: Jabberwocky sentences prime associated verbs. Lang. Cognit. Process. 28(10), 1439–1452 (2013)

    Article  Google Scholar 

  29. Kaljurand, K., Alumäe, T.: Controlled natural language in speech recognition based user interfaces. In: International Workshop on Controlled Natural Language, pp. 79–94 (2012)

    Google Scholar 

  30. Khan, O.Z., Sarikaya, R.: Making personal digital assistants aware of what they do not know. In: INTERSPEECH, pp. 1161–1165 (2016)

    Google Scholar 

  31. Klein, G., Tolle, J., Martini, P.: From detection to reaction-a holistic approach to cyber defense. In: Defense Science Research Conference and Expo (DSR) 2011, pp. 1–4. IEEE (2011)

    Google Scholar 

  32. Kong, X., Choi, J.-Y., Shattuck-Hufnagel, S.: Evaluating automatic speech recognition systems in comparison with human perception results using distinctive feature measures. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5810–5814. IEEE (2017)

    Google Scholar 

  33. Kuhn, T.: A survey and classification of controlled natural languages. Comput. Linguist. 40(1), 121–170 (2014)

    Article  Google Scholar 

  34. Lahiri, A., Reetz, H., Roberts, P.: Method and apparatus for automatic speech recognition. US Patent App. 15/105,552 (2016)

    Google Scholar 

  35. Lahiri, A., Reetz, H.: Distinctive features: phonological underspecification in representation and processing. J. Phonet. 38(1), 44–59 (2010)

    Article  Google Scholar 

  36. Liang, B., Li, H., Su, M., Bian, P., Li, X., Shi, W.: Deep Text Classification Can be Fooled. arXiv preprint arXiv:1704.08006 (2017)

  37. Lison, P., Meena, R.: Spoken dialogue systems: the new frontier in human-computer interaction. XRDS: Crossroads ACM Mag. Stud. 21(1), 46–51 (2014)

    Article  Google Scholar 

  38. Liu, W., Chen, F., Hu, H., Cheng, G., Huo, S., Liang, H.: A novel framework for zero-day attacks detection and response with cyberspace mimic defense architecture. In: Proceedings of 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 50–53 (2017)

    Google Scholar 

  39. Loukas, G., Gan, D., Vuong, T.: A taxonomy of cyber attack and defence mechanisms for emergency management networks. In: 2013 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), pp. 534–539. IEEE (2013)

    Google Scholar 

  40. McShane, M., Blissett, K., Nirenburg, I.: Treating unexpected input in incremental semantic analysis. In: Proceedings of The Fifth Annual Conference on Advances in Cognitive Systems, Cognitive Systems Foundation, Palo Alto, CA (2017)

    Google Scholar 

  41. McTear, M., Callejas, Z., Griol, D.: The Conversational Interface. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32967-3

    Book  Google Scholar 

  42. Navigli, R., Ponzetto, S.P.: Joining forces pays off: multilingual joint word sense disambiguation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1399–1410 (2012)

    Google Scholar 

  43. Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against deep learning systems using adversarial examples. arXiv preprint arXiv:1602.02697 (2016)

  44. Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: Military Communications Conference, MILCOM 2016–2016 IEEE, pp. 49–54 (2016)

    Google Scholar 

  45. Patten, T., Call, C., Mitchell, D., Taylor, J., Lasser, S.: Defining the malice space with natural language processing techniques. In: Cybersecurity Symposium (CYBERSEC), pp. 44–50. IEEE (2016)

    Google Scholar 

  46. Petracca, G., Sun, Y., Jaeger, T., Atamli, A.: Audroid: preventing attacks on audio channels in mobile devices. In: Proceedings of the 31st Annual Computer Security Applications Conference, pp. 181–190. ACM (2015)

    Google Scholar 

  47. Pucher, M., Türk, A., Ajmera, J., Fecher, N.: Phonetic distance measures for speech recognition vocabulary and grammar optimization. In: 3rd Congress of the Alps Adria Acoustics Association, pp. 2–5 (2007)

    Google Scholar 

  48. Rieck, K., Laskov, P.: Detecting unknown network attacks using language models. In: Büschkes, R., Laskov, P. (eds.) DIMVA 2006. LNCS, vol. 4064, pp. 74–90. Springer, Heidelberg (2006). https://doi.org/10.1007/11790754_5

    Chapter  Google Scholar 

  49. Roy, N., Shen, S., Hassanieh, H., Choudhury, R.R.: Inaudible voice commands: the long-range attack and defense. In: 15th USENIX Symposium on Networked Systems Design and Implementation NSDI 2018), pp. 547–560. USENIX Association (2018)

    Google Scholar 

  50. Rule, J.N.: A Symbiotic Relationship: The OODA Loop, Intuition, and Strategic Thought. US Army War College (2013)

    Google Scholar 

  51. Schneider, M.A., Wendland, M.-F., Hoffmann, A.: A negative input space complexity metric as selection criterion for fuzz testing. In: El-Fakih, K., Barlas, G., Yevtushenko, N. (eds.) ICTSS 2015. LNCS, vol. 9447, pp. 257–262. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25945-1_17

    Chapter  Google Scholar 

  52. Janse van Rensburg, A., Nurse, J.R., Goldsmith, M.: Attacker-parametrised attack graphs. In: 10th International Conference on Emerging Security Information, Systems and Technologies (2016)

    Google Scholar 

  53. Weller-Fahy, D.J., Borghetti, B.J., Sodemann, A.A.: A survey of distance and similarity measures used within network intrusion anomaly detection. IEEE Commun. Surv. Tutor. 17(1), 70–91 (2015)

    Article  Google Scholar 

  54. Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

  55. Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015)

    Article  Google Scholar 

  56. Young, P.J., Jin, J.H., Woo, S., Lee, D.H.: BadVoice: soundless voice-control replay attack on modern smartphones. In: 2016 Eighth International Conference on Ubiquitous and Future Networks (ICUFN), pp. 882–887. IEEE (2016)

    Google Scholar 

  57. Young, S., Gašić, M., Thomson, B., Williams, J.D.: POMDP-based statistical spoken dialog systems: a review. Proc. IEEE 101(5), 1160–1179 (2013)

    Article  Google Scholar 

  58. Zhang, G., Yan, C., Ji, X., Zhang, T., Zhang, T., Xu, W.: DolphinAttack: inaudible voice commands. arXiv preprint arXiv:1708.09537 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mary K. Bispham .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bispham, M.K., Agrafiotis, I., Goldsmith, M. (2020). The Security of the Speech Interface: A Modelling Framework and Proposals for New Defence Mechanisms. In: Mori, P., Furnell, S., Camp, O. (eds) Information Systems Security and Privacy. ICISSP 2019. Communications in Computer and Information Science, vol 1221. Springer, Cham. https://doi.org/10.1007/978-3-030-49443-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-49443-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-49442-1

  • Online ISBN: 978-3-030-49443-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics