ABSTRACT
Users with large domain knowledge can be reluctant to use prediction models. This also applies to the sports domain, where running coaches rarely rely on marathon prediction tools for race-plan advice for their runners’ next marathon. This paper studies the effect of adding interactivity to such prediction models, to incorporate and acknowledge users’ domain knowledge. In think-aloud sessions and an online study, we tested an interactive machine learning tool that allowed coaches to indicate the importance of earlier races feeding into the model. Our results show that coaches deploy rich knowledge when working with the model on runners familiar to them, and their adaptations improved model accuracy. Those coaches who could interact with the model displayed more trust and acceptance in the resulting predictions.
- 2020. RunKeeper. https://runkeeper.com/ Accessed: 2022-09-15.Google Scholar
- 2020. Strava. https://www.strava.com/ Accessed: 2022-09-15.Google Scholar
- Agnar Aamodt and Enric Plaza. 1994. Case-Based reasoning: Foundational issues, methodological variations, and system approaches. AI Communications 7, 1 (1994), 39–59. https://doi.org/10.3233/AIC-1994-7104Google ScholarCross Ref
- Saleema Amershi, Maya Cakmak, W. Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105–120. https://doi.org/10.1609/aimag.v35i4.2513Google ScholarDigital Library
- Hall P. Beck, J. Bates McKinney, Mary T. Dzindolet, and Linda G. Pierce. 2009. Effects of human-machine competition on intent errors in a target detection task. Human Factors 51, 4 (2009), 477–486. https://doi.org/10.1177/0018720809341746Google ScholarCross Ref
- Jakim Berndsen, Barry Smyth, and Aonghus Lawlor. 2019. Pace My Race: Recommendations for Marathon Running. In Proceedings of the 13th ACM Conference on Recommender Systems (Copenhagen, Denmark) (RecSys ’19). Association for Computing Machinery, New York, NY, USA, 246–250. https://doi.org/10.1145/3298689.3346991Google ScholarDigital Library
- Nadia Boukhelifa, Anastasia Bezerianos, and Evelyne Lutton. 2018. Evaluation of Interactive Machine Learning Systems. In Human and Machine Learning. Human-Computer Interaction Series., J. Zhou and F. Chen (Eds.). Springer, Cham. https://doi.org/10.1007/978-3-319-90403-0_17 arxiv:1801.07964Google ScholarCross Ref
- Imornefe Bowes and Robyn L. Jones. 2006. Working at the Edge of Chaos: Understanding Coaching as a Complex, Interpersonal System. The Sport Psychologist 20 (2006), 235–245. https://doi.org/10.1123/tsp.20.2.235Google ScholarCross Ref
- Carrie J. Cai, Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, Greg S. Corrado, Martin C. Stumpe, and Michael Terry. 2019. Human-Centered Tools for Coping with Imperfect Algorithms during Medical Decision-Making. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems - CHI ’19. 1–14. arxiv:1902.02960http://arxiv.org/abs/1902.02960Google ScholarDigital Library
- João Gustavo Claudino, Daniel de Oliveira Capanema, Thiago Vieira de Souza, Julio Cerca Serrão, Adriano C. Machado Pereira, and George P. Nassis. 2019. Current Approaches to the Use of Artificial Intelligence for Injury Risk Assessment and Performance Prediction in Team Sports: a Systematic Review. Sports Medicine - Open 5, 28 (2019). https://doi.org/10.1186/s40798-019-0202-3Google ScholarCross Ref
- Dave Collins, Loel Collins, and Howie J Carson. 2016. “If it feels right, do it”: Intuitive decision making in a sample of high-level sport coaches. Frontiers in psychology 7 (2016), 504.Google Scholar
- Joan Dallinga, Mark Janssen, Jet van der Werf, Ruben Walravens, Steven Vos, and Marije Deutekom. 2018. Analysis of the features important for the effectiveness of physical activity–related apps for recreational sports: Expert panel approach. JMIR mHealth and uHealth 6, 6 (2018). https://doi.org/10.2196/mhealth.9459Google ScholarCross Ref
- Robyn M Dawes and Bernard Corrigan. 1974. Linear models in decision making.Psychological bulletin 81, 2 (1974), 95.Google Scholar
- Srikant Devaraj, Sushil K. Sharma, Dyan J. Fausto, Sara Viernes, and Hadi Kharrazi. 2014. Barriers and Facilitators to Clinical Decision Support Systems Adoption: A Systematic Review. Journal of Business Administration Research 3, 2 (2014). https://doi.org/10.5430/jbar.v3n2p36Google ScholarCross Ref
- Cailbhe Doherty, Alison Keogh, James Davenport, Aonghus Lawlor, Barry Smyth, and Brian Caulfield. 2020. An evaluation of the training determinants of marathon performance: A meta-analysis with meta-regression. Journal of Science and Medicine in Sport 23, 2 (2020), 182–188. https://doi.org/10.1016/j.jsams.2019.09.013Google ScholarCross Ref
- John J. Dudley and Per Ola Kristensson. 2018. A review of user interface design for interactive machine learning. ACM Transactions on Interactive Intelligent Systems (TiiS) 8, 2 (2018), 1–37. https://doi.org/10.1145/3185517Google ScholarDigital Library
- Jerry Alan Fails and Dan R Olsen. 2003. Interactive Machine Learning. In Proceedings of the 8th international conference on Intelligent user interfaces (IUI ’03). 39–45.Google ScholarDigital Library
- Shi Feng and Jordan Boyd-Graber. 2019. What can ai do for me? evaluating machine learning interpretations in cooperative play. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 229–239.Google ScholarDigital Library
- Carl Foster, Matthew Schrager, Ann C. Snyder, and Nancy N. Thompson. 1994. Pacing Strategy and Athletic Performance. Sports Medicine 17, 2 (1994), 77–85. https://doi.org/10.2165/00007256-199417020-00001Google ScholarCross Ref
- Bhavya Ghai, Q Vera Liao, Yunfeng Zhang, Rachel Bellamy, and Klaus Mueller. 2021. Explainable active learning (xal) toward ai explanations as interfaces for machine teachers. Proceedings of the ACM on Human-Computer Interaction 4, CSCW3 (2021), 1–28.Google Scholar
- Jos Goudsmit, Ruby T. A. Otter, Inge Stoter, Berry van Holland, Stephan van der Zwaard, Johan de Jong, and Steven Vos. 2022. Co-Operative Design of a Coach Dashboard for Training Monitoring and Feedback. Sensors 22, 23 (2022). https://doi.org/10.3390/s22239073Google ScholarCross Ref
- William M. Grove, David H. Zald, Boyd S. Lebow, Beth E. Snitz, and Chad Nelson. 2000. Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment 12, 1 (2000), 19–30. https://doi.org/10.1037/1040-3590.12.1.19Google ScholarCross Ref
- Lijie Guo, Elizabeth M Daly, Oznur Alkan, Massimiliano Mattetti, Owen Cornec, and Bart Knijnenburg. 2022. Building trust in interactive machine learning via user contributed interpretable rules. In 27th International Conference on Intelligent User Interfaces. 537–548.Google ScholarDigital Library
- Akshit Gupta, Debadeep Basu, Ramya Ghantasala, Sihang Qiu, and Ujwal Gadiraju. 2022. To Trust or Not To Trust: How a Conversational Interface Affects Trust in a Decision Support System. In Proceedings of the ACM Web Conference 2022. 3531–3540.Google ScholarDigital Library
- Kevin Anthony Hoff and Masooda Bashir. 2015. Trust in automation: Integrating empirical evidence on factors that influence trust. Human Factors 57, 3 (2015), 407–434. https://doi.org/10.1177/0018720814547570Google ScholarCross Ref
- Andreas Holzinger. 2016. Interactive machine learning for health informatics: when do we need the human-in-the-loop?Brain Informatics 3, 2 (2016), 119–131. https://doi.org/10.1007/s40708-016-0042-6Google ScholarCross Ref
- Donald Honeycutt, Mahsan Nourani, and Eric Ragan. 2020. Soliciting human-in-the-loop user feedback for interactive machine learning reduces user trust and impressions of model accuracy. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 8. 63–72.Google ScholarCross Ref
- Mark Janssen, Jeroen Scheerder, Erik Thibaut, Aarnout Brombacher, and Steven Vos. 2017. Who uses running apps and sports watches? Determinants and consumer profiles of event runners’ usage of running-related smartphone applications and sports watches. PLoS ONE 12, 7 (2017), 1–17. https://doi.org/10.1371/journal.pone.0181167Google ScholarCross Ref
- Daniel Kahneman, Paul Slovic, and Amos Tversky. 1982. Judgment under uncertainty: Heuristics and biases. Cambridge University Press. https://doi.org/10.1097/00001888-199907000-00012Google ScholarCross Ref
- Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3313831.3376219Google ScholarDigital Library
- A. Keogh, B. Smyth, B. Caulfield, A. Lawlor, J. Berndsen, and C. Doherty. 2019. Prediction equations for marathon performance: A systematic review. International Journal of Sports Physiology and Performance 14, 9 (2019), 1159–1169.Google ScholarCross Ref
- Saif Khairat, David Marc, William Crosby, and Ali Al Sanousi. 2018. Reasons for physicians not adopting clinical decision support systems: Critical analysis. Journal of Medical Internet Research 20, 4 (2018). https://doi.org/10.2196/medinform.8912Google ScholarCross Ref
- Gary Klein, Ben Shneiderman, Robert R. Hoffman, and Kenneth M. Ford. 2017. Why Expertise Matters: A Response to the Challenges. IEEE Intelligent Systems 32, 6 (2017), 67–73. https://doi.org/10.1109/MIS.2017.4531230Google ScholarCross Ref
- Todd Kulesza, Margaret Burnett, Weng-keen Wong, and Simone Stumpf. 2015. Principles of Explanatory Debugging to Personalize Interactive Machine Learning. In Proceedings of the 20th international conference on Intelligent user interfaces (IUI ’15). 126–137. https://doi.org/10.1145/2678025.2701399Google ScholarDigital Library
- Johannes Kunkel, Tim Donkers, Lisa Michael, Catalin-Mihai Barbu, and Jürgen Ziegler. 2019. Let me explain: Impact of personal and impersonal explanations on trust in recommender systems. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–12.Google ScholarDigital Library
- Q. Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing design practices for explainable AI user experiences. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI ’20). Honolulu, Hawai.Google ScholarDigital Library
- Jennifer M Logg, Julia A Minson, and Don A Moore. 2019. Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes 151 (2019), 90–103.Google ScholarCross Ref
- John Lyle. 2010. Coaches’ Decision Making: A Naturalistic Decision Making Analysis. In Sports Coaching E-book: Professionalisation and Practice. Elsevier Health Sciences, Chapter 3, 27–42.Google Scholar
- George M Marakas, Richard D Johnson, and Paul F Clay. 2007. The Evolving Nature of the Computer Self-Efficacy Construct: An Empirical Investigation of Measurement Construction, Validity, Reliability and Stability Over Time.Journal of the Association for Information Systems 8, 1 (2007), 16–46. https://doi.org/10.17705/1jais.00112Google ScholarCross Ref
- Kevin S. Masters, Benjamin M. Ogles, and Jeffrey A. Jolton. 1993. The development of an instrument to measure motivation for marathon running: The motivations of marathoners scales (moms). Research Quarterly for Exercise and Sport 64, 2 (1993), 134–143. https://doi.org/10.1080/02701367.1993.10608790Google ScholarCross Ref
- D. Harrison McKnight, Vivek Choudhury, and Charles Kacmar. 2002. Developing And Validating Trust Measure for E-Commerce: An Integrative Typology.Information Systems Research 13, 3 (2002), 334–359.Google ScholarDigital Library
- Paul E Meehl. 1954. Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. (1954).Google Scholar
- Benjamin Ogles and Kevin Masters. 2003. A Typology of Marathon Runners Based on Cluster Analysis of Motivations. Journal of Sport Behavior 26, 1 (2003), 69.Google Scholar
- Monika Pobiruchin, Julian Suleder, Richard Zowalla, and Martin Wiesner. 2017. Accuracy and Adoption of Wearable Technology Used by Active Citizens: A Marathon Event Field Study. JMIR mHealth and uHealth 5, 2 (2017). https://doi.org/10.2196/mhealth.6395Google ScholarCross Ref
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.Google ScholarDigital Library
- Peter S Riegel. 1977. Time predicting. Runner’s World Magazine 12, 8 (1977).Google Scholar
- Heleen Rutjes, Martijn C. Willemsen, and Wijnand A. IJsselsteijn. 2019. Beyond Behavior: The Coach’s Perspective on Technology in Health Coaching. In Proceedings of ACM Conference on Human Factors in Computing Systems (CHI ’19). Glasgow, Scotland UK.Google ScholarDigital Library
- Kurt Salzinger. 2005. Clinical, Statistical, and Broken-Leg Predictions. Behavior and Philosophy 33 (2005), 91–99.Google Scholar
- Jeroen Scheerder, Koen Breedveld, and Julie Borgers. 2015. Running across Europe: the rise and size of one of the largest sport markets. Springer.Google Scholar
- Max Schemmer, Niklas Kuehl, Carina Benz, Andrea Bartos, and Gerhard Satzger. 2023. Appropriate reliance on AI advice: Conceptualization and the effect of explanations. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 410–422.Google ScholarDigital Library
- Ben Shneiderman. 2020. Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy. International Journal of Human-Computer Interaction 36, 6 (2020), 495–504. https://doi.org/10.1080/10447318.2020.1741118Google ScholarCross Ref
- Patrice Y. Simard, Saleema Amershi, David M. Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, and John Wernsing. 2017. Machine Teaching: A New Paradigm for Building Machine Learning Systems. arXiv preprint arXiv:1707.06742 (2017). arxiv:1707.06742http://arxiv.org/abs/1707.06742Google Scholar
- Barry Smyth and Pádraig Cunningham. 2017. Running with cases: A CBR approach to running your best marathon. Lecture Notes in Computer Science 10339 (2017). https://doi.org/10.1007/978-3-319-61030-6_25Google ScholarCross Ref
- Barry Smyth and Martijn C. Willemsen. 2020. Predicting the Personal-Best Times of Speed Skaters Using Case-Based Reasoning. In Case-Based Reasoning Research and Development - 28th International Conference, ICCBR 2020, Proceedings(Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)), Ian Watson and Rosina Weber (Eds.). Springer, Germany, 112–126. https://doi.org/10.1007/978-3-030-58342-2_8 28th International Conference on Case-Based Reasoning, ICCBR 2020 ; Conference date: 08-06-2020 Through 12-06-2020.Google ScholarDigital Library
- Janet A. Sniezek and Lyn M. Van Swol. 2001. Trust, Confidence, and Expertise in a Judge-Advisor System. Organizational Behavior and Human Decision Processes 84, 2 (2001), 288–307. https://doi.org/10.1006/obhd.2000.2926Google ScholarCross Ref
- Clare D. Stevinson and Stuart J H Biddle. 1998. Cognitive orientations in marathon running and "hitting the wall". British Journal of Sports Medicine 32, 3 (1998), 229–235. https://doi.org/10.1136/bjsm.32.3.229Google ScholarCross Ref
- Simone Stumpf, Vidya Rajaram, Lida Li, Margaret Burnett, Thomas Dietterich, Erin Sullivan, Russell Drummond, and Jonathan Herlocker. 2007. Toward Harnessing User Feedback for Machine Learning. In Proceeding of the ACM Conference on Intelligent User Interfaces (IUI ’07). 82–91.Google ScholarDigital Library
- Simone Stumpf, Erin Sullivan, Erin Fitzhenry, Ian Oberst, Weng Keen Wong, and Margaret Burnett. 2008. Integrating rich user feedback into intelligent user interfaces. International Conference on Intelligent User Interfaces, Proceedings IUI (2008), 50–59. https://doi.org/10.1145/1378773.1378781Google ScholarDigital Library
- Giovanni Tanda. 2011. Prediction of marathon performance time on the basis of training indices. Journal of Human Sport and Exercise 6, 3 (2011), 511–520. https://doi.org/10.4100/jhse.2011.63.05Google ScholarCross Ref
- Stefano Teso, Öznur Alkan, Wolfgang Stammer, and Elizabeth Daly. 2023. Leveraging explanations in interactive machine learning: An overview. Frontiers in Artificial Intelligence 6 (2023), 1066049.Google ScholarCross Ref
- Stefano Teso and Kristian Kersting. 2019. Explanatory interactive machine learning. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 239–245.Google ScholarDigital Library
- Ambra Vitti, Pantelis T. Nikolaidis, Elias Villiger, Vincent Onywera, and Beat Knechtle. 2020. The “New York City Marathon”: participation and performance trends of 1.2M runners during half-century. Research in Sports Medicine 28, 1 (2020), 121–137. https://doi.org/10.1080/15438627.2019.1586705Google ScholarCross Ref
- Danding Wang, Qian Yang, Ashraf Abdul, and Brian Y. Lim. 2019. Designing Theory-Driven User-Centric Explainable AI. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3290605.3300831Google ScholarDigital Library
- Xinru Wang and Ming Yin. 2021. Are explanations helpful? a comparative study of the effects of explanations in ai-assisted decision-making. In 26th international conference on intelligent user interfaces. 318–328.Google ScholarDigital Library
- Ilan Yaniv and Eli Kleinberger. 2000. Advice Taking in Decision Making: Egocentric Discounting and Reputation Formation. Organizational Behavior and Human Decision Processes 83, 2 (2000), 260–281. https://doi.org/10.1006/obhd.2000.2909Google ScholarCross Ref
Index Terms
- Benefits of Human-AI Interaction for Expert Users Interacting with Prediction Models: a Study on Marathon Running
Recommendations
Expert, linear models, and nonlinear models of expert decision making in bankruptcy prediction: a lens model analysis
Special section: Data miningAnalysis of human judgment and decision making provides useful methodologies for examining the human decision process and substantive results. One such methodology is a lens model analysis. We used such a model to study how well a model of expert ...
An Extended Case-Based Approach to Race-Time Prediction for Recreational Marathon Runners
Case-Based Reasoning Research and DevelopmentAbstractAs running has become an increasingly popular method of personal exercise, more and more recreational runners have been testing themselves by participating in endurance events such as marathons. Even though elite endurance runners have been the ...
Improving expert predictions with conformal prediction
ICML'23: Proceedings of the 40th International Conference on Machine LearningAutomated decision support systems promise to help human experts solve multiclass classification tasks more efficiently and accurately. However, existing systems typically require experts to understand when to cede agency to the system or when to exercise ...
Comments