research-article

Automatic Detection of Usability Problem Encounters in Think-aloud Sessions

Authors:
Mingming Fan

School of Information, Rochester Institute of Technology, NY, USA

School of Information, Rochester Institute of Technology, NY, USA
View Profile

,
Yue Li

Department of Computer Science, University of Toronto, ON, Canada

Department of Computer Science, University of Toronto, ON, Canada
View Profile

,
Khai N. Truong

Department of Computer Science, University of Toronto, ON, Canada

Department of Computer Science, University of Toronto, ON, Canada
View Profile

ACM Transactions on Interactive Intelligent Systems Volume 10 Issue 2Article No.: 16pp 1–24https://doi.org/10.1145/3385732

Published:30 May 2020Publication History

ACM Transactions on Interactive Intelligent Systems

Abstract

Think-aloud protocols are a highly valued usability testing method for identifying usability problems. Despite the value of conducting think-aloud usability test sessions, analyzing think-aloud sessions is often time-consuming and labor-intensive. Consequently, previous research has urged the community to develop techniques to support fast-paced analysis. In this work, we took the first step to design and evaluate machine learning (ML) models to automatically detect usability problem encounters based on users’ verbalization and speech features in think-aloud sessions. Inspired by recent research that shows subtle patterns in users’ verbalizations and speech features tend to occur when they encounter problems, we examined whether these patterns can be utilized to improve the automatic detection of usability problems. We first conducted and recorded think-aloud sessions and then examined the effect of different input features, ML models, test products, and users on usability problem encounters detection. Our work uncovers several technical and user interface design challenges and sets a baseline for automating usability problem detection and integrating such automation into UX practitioners’ workflow.

References

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard et al. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), 265--283.Google Scholar
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Mag. 35, 4 (2014), 105. DOI:https://doi.org/10.1609/aimag.v35i4.2513Google ScholarDigital Library
Morten Sieker Andreasen, Henrik Villemann Nielsen, Simon Ormholt Schrøder, and Jan Stage. 2007. What happened to remote usability testing?: An empirical study of three methods. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1405--1414. DOI:https://doi.org/10.1145/1240624.1240838Google ScholarDigital Library
Martin Blanchard, Nathaniel |D'Mello, Sidney |Olney, Andrew M.|Nystrand. 2015. Automatic classification of question 8 answer discourse segments from teacher's speech in classrooms. Int. Educ. Data Min. Soc. (2015). Retrieved from https://eric.ed.gov/?id=ED560555.Google Scholar
Liora Bresler, Judy Davidson Wasser, Nancy B. Hertzog, and Mary Lemons. 1996. Beyond the lone ranger researcher: Team work in qualitative research. Res. Stud. Music Educ. 7, 1 (1996), 13--27. DOI:https://doi.org/10.1177/1321103X9600700102Google ScholarCross Ref
Anders Bruun, Peter Gull, Lene Hofmeister, and Jan Stage. 2009. Let your users do the testing: A comparison of three remote asynchronous usability testing methods. In Proceedings of the 27th International Conference on Human Factors in Computing Systems (CHI’09). 1619--1628. DOI:https://doi.org/10.1145/1518701.1518948Google ScholarDigital Library
Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes. 2004. Ensemble selection from libraries of models. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). 18. DOI:https://doi.org/10.1145/1015330.1015432Google ScholarDigital Library
Kapil Chalil Madathil and Joel S. Greenstein. 2011. Synchronous remote usability testing: A new approach facilitated by virtual worlds. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’11). 2225--2234. DOI:https://doi.org/10.1145/1978942.1979267Google Scholar
Elizabeth Charters. 2003. The use of think-aloud methods in qualitative research an introduction to think-aloud methods. Brock Educ. J. 12, 2 (2003), 68--82. DOI:https://doi.org/10.26522/brocked.v12i2.38Google ScholarCross Ref
Nan-Chen Chen, Margaret Drouhard, Rafal Kocielnik, Jina Suh, and Cecilia R. Aragon. 2018. Using machine learning to support qualitative coding in social science: Shifting the focus to ambiguity. ACM Trans. Interact. Intell. Syst. 8, 2 (2018). 9:1--9:20. DOI:https://doi.org/10.1145/3185515Google ScholarDigital Library
Parmit K. Chilana, Jacob O. Wobbrock, and Andrew J. Ko. 2010. Understanding usability practices in complex domains. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2337--2346. DOI:https://doi.org/10.1145/1753326.1753678Google Scholar
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. (2014). Retrieved from http://arxiv.org/abs/1406.1078.Google Scholar
Torkil Clemmensen, Qingxin Shi, Jyoti Kumar, Huiyang Li, Xianghong Sun, and Pradeep Yammiyavar. 2007. Cultural usability tests—How usability tests are not the same all over the world. In Usability and Internationalization. HCI and Culture. Springer Berlin, 281--290. DOI:https://doi.org/10.1007/978-3-540-73287-7_35Google Scholar
Lynne Cooke. 2010. Assessing concurrent think-aloud protocol as a usability test method: A technical communication approach. IEEE Trans. Prof. Commun. 53, 3 (2010), 202--215. DOI:https://doi.org/10.1109/TPC.2010.2052859Google ScholarCross Ref
Kevin Crowston, Eileen E. Allen, and Robert Heckman. 2012. Using natural language processing technology for qualitative data analysis. Int. J. Soc. Res. Methodol. 15, 6 (2012), 523--543. DOI:https://doi.org/10.1080/13645579.2011.625764Google ScholarCross Ref
I. Dey. 1993. Qualitative Data Analysis: A User-Friendly Guide for Social Scientists. Routledge. DOI:https://doi.org/10.4324/9780203879276Google Scholar
Thomas G. Dietterich. 2000. Ensemble methods in machine learning. Springer, Berlin, 1--15. DOI:https://doi.org/10.1007/3-540-45014-9_1Google ScholarCross Ref
Graham Dove, Kim Halskov, Jodi Forlizzi, and John Zimmerman. 2017. UX design innovation: Challenges for working with machine learning as a design material. In Proceedings of the Chi Conference on Human Factors in Computing Systems. 278--288.Google ScholarDigital Library
Margaret Drouhard, Nan Chen Chen, Jina Suh, Rafal Kocielnik, Vanessa Pena-Araya, Keting Cen, Xiangyi Zheng, and Cecilia R. Aragon. 2017. Aeonium: Visual analytics to support collaborative qualitative coding. In Proceedings of the IEEE Pacific Visualization Symposium. 220--229. DOI:https://doi.org/10.1109/PACIFICVIS.2017.8031598Google Scholar
Upol Ehsan, Pradyumna Tambwekar, Larry Chan, Brent Harrison, and Mark O. Riedl. 2019. Automated rationale generation: A technique for explainable AI and its effects on human perceptions. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI’19). 263--274. DOI:https://doi.org/10.1145/3301275.3302316Google Scholar
Elling Sanne, Lentz Leo, and Menno De Jong. 2012. Combining concurrent think-aloud protocols and eye-tracking observations: An analysis of verbalizations. IEEE Trans. Prof. Commun. 55, 3 (2012), 206--220. DOI:https://doi.org/10.1109/TPC.2012.2206190Google ScholarCross Ref
K. Anders Ericsson and Herbert A. Simon. 1984. Protocol Analysis: Verbal Reports as Data. The MIT Press, Cambridge, MA.Google Scholar
Mingming Fan, Jinglan Lin, Christina Chung, and Khai N. Truong. 2019. Concurrent think-aloud verbalizations and usability problems. ACM Trans. Comput. Interact. 26, 5 (2019), 1--35. DOI:https://doi.org/10.1145/3325281Google ScholarDigital Library
Asbjørn Følstad, Effie Law, and Kasper Hornbæk. 2012. Analysis in practical usability evaluation: A survey study. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2127--2136. DOI:https://doi.org/10.1145/2207676.2208365Google ScholarDigital Library
Mark C. Fox, K. Anders Ericsson, and Ryan Best. 2011. Do procedures for verbal reporting of thinking have to be reactive? A meta-analysis and recommendations for best reporting methods. Psychol. Bull. 137, 2 (2011), 316.Google ScholarCross Ref
Palash Goyal, Sumit Pandey, Karan Jain, Palash Goyal, Sumit Pandey, and Karan Jain. 2018. Research paper implementation: Sentiment classification. In Deep Learning for Natural Language Processing. Apress, 231--268. DOI:https://doi.org/10.1007/978-1-4842-3685-7_5Google Scholar
Jonathan Grizou, I. Iturrate, Luis Montesano, Pierre-Yves Oudeyer, and Manuel Lopes. 2014. Interactive learning from unlabeled instructions. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI’14). Retrieved from http://auai.org/uai2014/proceedings/individuals/198.pdf.Google Scholar
Jan Gulliksen, Inger Boivie, Jenny Persson, Anders Hektor, and Lena Herulf. 2004. Making a difference: A survey of the usability profession in Sweden. In Proceedings of the 3rd Nordic Conference on Human-Computer Interaction (NordiCHI ’04). 207--215. DOI:https://doi.org/10.1145/1028014.1028046Google ScholarDigital Library
Morten Hertzum, Pia Borlund, and Kristina B. Kristoffersen. 2015. What do thinking-aloud participants say? A comparison of moderated and unmoderated usability sessions. Int. J. Hum. Comput. Interact. 31, 9 (2015), 557--570. DOI:https://doi.org/10.1080/10447318.2015.1065691Google ScholarCross Ref
Morten Hertzum and Kristin Due Holmegaard. 2013. Thinking aloud in the presence of interruptions and time constraints. Int. J. Hum. Comput. Interact. 29, 5 (2013), 351--364. DOI:https://doi.org/10.1080/10447318.2012.711705Google ScholarCross Ref
Morten Hertzum and Niels Ebbe Jacobsen. 2001. The evaluator effect: A chilling fact about usability evaluation methods. Int. J. Hum. Comput. Interact. 13, 4 (2001), 421--443. DOI:https://doi.org/10.1207/S15327590IJHC1304_05Google ScholarCross Ref
Masahiro Hori, Yasunori Kihara, and Takashi Kato. 2011. Investigation of indirect oral operation method for think aloud usability testing. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 38--46. DOI:https://doi.org/10.1007/978-3-642-21753-1_5Google Scholar
Paula Jarzabkowski, Rebecca Bednarek, and Laure Cabantous. 2015. Conducting global team-based ethnography: Methodological challenges and practical methods. Hum. Relations 68, 1 (2015), 3--33. DOI:https://doi.org/10.1177/0018726714535449Google ScholarCross Ref
Claire-Marie Karat, Robert Campbell, and Tarra Fiegel. 1992. Comparison of empirical testing and walkthrough methods in user interface evaluation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’92). 397--404. DOI:https://doi.org/10.1145/142750.142873Google ScholarDigital Library
Jesper Kjeldskov, Mikael B. Skov, and Jan Stage. 2004. Instant data analysis: Conducting usability evaluations in a day. In Proceedings of the 3rd Nordic Conference on Human-Computer Interaction (NordiCHI’04). 233--240. DOI:https://doi.org/10.1145/1028014.1028050Google ScholarDigital Library
Rafal Kocielnik, Saleema Amershi, and Paul N. Bennett. 2019. Will you accept an imperfect AI? exploring designs for adjusting end-user expectations of AI systems. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). 1--14. DOI:https://doi.org/10.1145/3290605.3300641Google Scholar
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial IntellIgence. 333, (2015), 2267--2273. DOI:https://doi.org/10.1145/2808719.2808746Google Scholar
Megh Marathe and Kentaro Toyama. 2018. Semi-automated coding for qualitative research. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’18). 1--12. DOI:https://doi.org/10.1145/3173574.3173922Google ScholarDigital Library
Sharon McDonald, Helen M. Edwards, and Tingting Zhao. 2012. Exploring think-alouds in usability testing: An international survey. IEEE Trans. Prof. Commun. 55, 1 (2012), 2--19. DOI:https://doi.org/10.1109/TPC.2011.2182569Google ScholarCross Ref
Sharon McDonald and Helen Petrie. 2013. The effect of global instructions on think-aloud testing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13). 2941--2944. DOI:https://doi.org/10.1145/2470654.2481407Google ScholarDigital Library
David Meignan, Sigrid Knust, Jean-Marc Frayret, Gilles Pesant, and Nicolas Gaud. 2015. A review and taxonomy of interactive optimization methods in operations research. ACM Trans. Interact. Intell. Syst. 5, 3 (2015), 1--43. DOI:https://doi.org/10.1145/2808234Google ScholarDigital Library
Jakob Nielsen. 1993. Usability Engineering. Elsevier.Google ScholarDigital Library
Mie Nørgaard and Kasper Hornbæk. 2006. What do usability evaluators do in practice? An explorative study of think-aloud testing. In Proceedings of the 6th ACM Conference on Designing Interactive Systems (DIS’06). 209. DOI:https://doi.org/10.1145/1142405.1142439Google ScholarDigital Library
Erica Olmsted-Hawala and Jennifer Romano Bergstrom. 2012. Think-aloud protocols: Does age make a difference? In Proceedings of the STC Technical Communication Summit.Google Scholar
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg et al. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825--2830.Google ScholarDigital Library
Jonathon Read, Rebecca Dridan, Stephan Oepen, and Lars Jørgen Solberg. 2012. Sentence boundary detection: A long solved problem? In Proceedings of the International Conference on Computational Linguistics (COLING’12). 985--994.Google Scholar
Qingxin Shi. 2008. A field study of the relationship and communication between Chinese evaluators and users in thinking aloud usability tests. In Proceedings of the 5th Nordic Conference on Human-computer Interaction Building Bridges (NordiCHI’08). 344. DOI:https://doi.org/10.1145/1463160.1463198Google ScholarDigital Library
Andreas Sonderegger, Sven Schmutz, and Juergen Sauer. 2016. The influence of age in usability testing. Appl. Ergon. 52, (2016), 291--300. DOI:https://doi.org/10.1016/j.apergo.2015.06.012Google ScholarCross Ref
Howard Tamler. 1998. How (much) to intervene in a usability testing session. Common Gr. 8, 3 (1998), 11--15.Google Scholar
Karel Vredenburg, Ji-Ye Mao, Paul W. Smith, and Tom Carey. 2002. A survey of user-centered design practice. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems Changing Our World, Changing Ourselves (CHI’02). 471. DOI:https://doi.org/10.1145/503457.503460Google ScholarDigital Library
Zuowei Wang, Xingyu Pan, Kevin F. Miller, and Kai S. Cortina. 2014. Automatic classification of activities in classroom discourse. Comput. Educ. 78, (2014), 115--123. DOI:https://doi.org/10.1016/J.COMPEDU.2014.05.010Google Scholar
Brad Wuetherick. 2010. Basics of qualitative research: Techniques and procedures for developing grounded theory. Can. J. Univ. Contin. Educ. 36, 2 (2010). DOI:https://doi.org/10.21225/D5G01TGoogle Scholar
Jasy Liew Suet Yan, Nancy McCracken, and Kevin Crowston. 2014. Semi-automatic content analysis of qualitative data. In Proceedings of the iConference. DOI:https://doi.org/10.9776/14399Google Scholar
Jasy Liew Suet Yan, Nancy McCracken, Shichun Zhou, and Kevin Crowston. 2014. Optimizing features in active machine learning for complex qualitative content analysis. In Proceedings of the ACL Workshop on Language Technologies and Computational Social Science 56, Ml (2014), 44--48. DOI:https://doi.org/10.3115/v1/w14-2513Google Scholar
Qian Yang, Alex Scuito, John Zimmerman, Jodi Forlizzi, and Aaron Steinfeld. 2018. Investigating how experienced UX designers effectively work with machine learning. In Proceedings of the Designing Interactive Systems Conference (DIS’18). 585--596. DOI:https://doi.org/10.1145/3196709.3196730Google ScholarDigital Library
Qian Yang, John Zimmerman, Aaron Steinfeld, and Anthony Tomasic. 2016. Planning adaptive mobile experiences when wireframing. In Proceedings of the ACM Conference on Designing Interactive Systems. 565--576.Google ScholarDigital Library
Tingting Zhao, Sharon McDonald, and Helen M. Edwards. 2014. The impact of two different think-aloud instructions in a usability test: A case of just following orders? Behav. Inf. Technol. 33, 2 (2014), 162--182. DOI:https://doi.org/10.1080/0144929X.2012.708786Google ScholarCross Ref
Haiyi Zhu, Robert E. Kraut, Yi-Chia Wang, and Aniket Kittur. 2011. Identifying shared leadership in Wikipedia. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’11). 3431. DOI:https://doi.org/10.1145/1978942.1979453Google ScholarDigital Library
Christian's Python Library: A Python library for voice analysis. Retrieved from https://homepage.univie.ac.at/christian.herbst/python/namespacepraat_util.html.Google Scholar
Praat: Doing Phonetics by Computer. Retrieved from http://www.fon.hum.uva.nl/praat/.Google Scholar
Sound: To Pitch (ac)… Retrieved from http://www.fon.hum.uva.nl/praat/manual/Sound__To_Pitch__ac____.html.Google Scholar
tf.random.uniform | TensorFlow Core r2.0. Retrieved from https://www.tensorflow.org/api_docs/python/tf/random/uniform.Google Scholar

Index Terms

Automatic Detection of Usability Problem Encounters in Think-aloud Sessions
1. Human-centered computing
2. Information systems

Recommendations

Older Adults’ Think-Aloud Verbalizations and Speech Features for Identifying User Experience Problems
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Subtle patterns in users’ think-aloud (TA) verbalizations and speech features are shown to be telltale signs of User Experience (UX) problems. However, such patterns were uncovered among young adults. Whether such patterns apply for older adults remains ...
Read More
Concurrent Think-Aloud Verbalizations and Usability Problems

The concurrent think-aloud protocol—in which participants verbalize their thoughts when performing tasks—is a widely employed approach in usability testing. Despite its value, analyzing think-aloud sessions can be onerous because it often entails ...
Read More
Think-aloud protocols: a comparison of three think-aloud protocols for use in testing data-dissemination web sites for usability
CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

We describe an empirical, between-subjects study on the use of think-aloud protocols in usability testing of a federal data-dissemination Web site. This double-blind study used three different types of think-aloud protocols: a traditional protocol, a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Interactive Intelligent Systems Volume 10, Issue 2
June 2020
155 pages
ISSN:2160-6455
EISSN:2160-6463
DOI:10.1145/3403610
Editor:
Michelle X. Zhou
Juji, Inc., USA
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 May 2020
- Online AM: 7 May 2020
- Revised: 1 February 2020
- Accepted: 1 February 2020
- Received: 1 May 2019
Published in tiis Volume 10, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
AI-assisted UX analysis method
Think aloud
machine learning
speech features
usability problem
user experience (UX)
verbalization
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 507
  Total Downloads
- Downloads (Last 12 months)124
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Automatic Detection of Usability Problem Encounters in Think-aloud Sessions

ACM Transactions on Interactive Intelligent Systems

Abstract

References

Cited By

Index Terms

Recommendations

Older Adults’ Think-Aloud Verbalizations and Speech Features for Identifying User Experience Problems

Concurrent Think-Aloud Verbalizations and Usability Problems

Think-aloud protocols: a comparison of three think-aloud protocols for use in testing data-dissemination web sites for usability

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Automatic Detection of Usability Problem Encounters in Think-aloud Sessions

ACM Transactions on Interactive Intelligent Systems

Abstract

References

Cited By

Index Terms

Recommendations

Older Adults’ Think-Aloud Verbalizations and Speech Features for Identifying User Experience Problems

Concurrent Think-Aloud Verbalizations and Usability Problems

Think-aloud protocols: a comparison of three think-aloud protocols for use in testing data-dissemination web sites for usability

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media