Skip to main content
Log in

Strategic advice provision in repeated human-agent interactions

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

This paper addresses the problem of automated advice provision in scenarios that involve repeated interactions between people and computer agents. This problem arises in many applications such as route selection systems, office assistants and climate control systems. To succeed in such settings agents must reason about how their advice influences people’s future actions or decisions over time. This work models such scenarios as a family of repeated bilateral interaction called “choice selection processes”, in which humans or computer agents may share certain goals, but are essentially self-interested. We propose a social agent for advice provision (SAP) for such environments that generates advice using a social utility function which weighs the sum of the individual utilities of both agent and human participants. The SAP agent models human choice selection using hyperbolic discounting and samples the model to infer the best weights for its social utility function. We demonstrate the effectiveness of SAP in two separate domains which vary in the complexity of modeling human behavior as well as the information that is available to people when they need to decide whether to accept the agent’s advice. In both of these domains, we evaluated SAP in extensive empirical studies involving hundreds of human subjects. SAP was compared to agents using alternative models of choice selection processes informed by behavioral economics and psychological models of decision-making. Our results show that in both domains, the SAP agent was able to outperform alternative models. This work demonstrates the efficacy of combining computational methods with behavioral economics to model how people reason about machine-generated advice and presents a general methodology for agent-design in such repeated advice settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. We use the term “world state” to disambiguate the states of an MDP from those of a selection process.

  2. This method is more common in POMDPs, however, since our state space is very large, we use this method as well.

  3. This model does not require an additional parameter for the actual cost for the receiver (\(c_R(a,v)\)), since \(c_R(a,v)\) is already a linear combination of the comfort level and the energy consumption.

  4. In fact, the exact equivalent to the road selection domain, would be assuming that the user set a cost to each of the possible combinations of the heat load and each of the possible power levels. However, such an assumption would result with too many arms, most of which would not be sampled or sampled only once, and thus would not result in a good human model.

References

  1. Camerer, C. F. (2003). Behavioral game theory. Experiments in strategic interaction, Chapter 2. Princeton: Princeton University Press.

  2. Bonaccio, S., & Dalal, R. S. (2006). Advice taking and decision-making: An integrative literature review and implications for the organizational sciences. Organizational Behavior and Human Decision Processes, 101(2), 127–151.

    Article  Google Scholar 

  3. Yaniv, I., & Kleinberger, E. (2000). Advice taking in decision making: Egocentric discounting and reputation formation. Organizational Behavior and Human Decision Processes, 83(2), 260–281.

    Article  Google Scholar 

  4. Gans, N., Knox, G., & Croson, R. (2007). Simple models of discrete choice and their performance in bandit experiments. Manufacturing & Service Operations Management, 9(4), 383–408.

    Article  Google Scholar 

  5. Haile, P. A., Hortasu, A., & Kosenok, G. (2008). On the empirical content of quantal response equilibrium. American Economic Review, 98(1), 180–200.

    Article  Google Scholar 

  6. Amazon. (2010). Mechanical turk services. Retrieved from http://www.mturk.com/.

  7. Azaria, A., Rabinovich, Z., Kraus, S., Goldman, C. V., & Gal, Y. (2012). Strategic advice provision in repeated human-agent interactions. In The 26th AAAI Conference on Artificial Intelligence (AAAI), Bellevue, WA.

  8. Jonker, C. M., Hindriks, K. V., Wiggers, P., & Broekens, J. (2012). Negotiating agents. AI Magazine, 33(3), 79.

    Google Scholar 

  9. Rovatsos, M., & Belesiotis, A. (2007). Advice taking in multiagent reinforcement learning. In AAMAS (pp. 237). New York: ACM.

  10. Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749.

    Article  Google Scholar 

  11. Ricci, F., Rokach, L., Shapira, B., & Kantor, P. B. (Eds.). (2011). Recommender systems handbook. New York: Springer.

    MATH  Google Scholar 

  12. Azaria, A., Hassidim, A., Kraus, S., Eshkol, A., Weintraub, O., & Netanely, I. (2013). Movie recommender system for profit maximization. In RecSys (pp. 121–128).

  13. Chen, L. S., Hsu, F. H., Chen, M. C., & Hsu, Y. C. (2008). Developing recommender systems with the consideration of product profitability for sellers. Information Sciences, 178(4), 1032–1048.

    Article  Google Scholar 

  14. Das, A., Mathieu, C., & Ricketts, D. (2009). Maximizing profit using recommender systems. ArXiv e-prints, pp. 0908, 3633.

  15. Pathak, B., Garfinkel, R., Gopal, R. D., Venkatesan, R., & Yin, F. (2010). Empirical analysis of the impact of recommender systems on sales. Journal of Management Information Systems, 27(2), 159–188.

    Article  Google Scholar 

  16. Shani, G., Heckerman, D., & Brafman, R. I. (2005). An MDP-based recommender system. The Journal of Machine Learning Research, 6, 1265–1295.

    MATH  MathSciNet  Google Scholar 

  17. Rosenberg, S. W., Bohan, L., McCafferty, P., & Harris, K. (1986). The image and the vote: The effect of candidate presentation on voter preference. American Journal of Political Science, 30, 108–127.

    Article  Google Scholar 

  18. Fenster, M., Zuckerman, I., & Kraus, S. (2012). Guiding user choice during discussion by silence, examples and justifications. ECAI (pp. 330–335). Amsterdam: IOS Press.

    Google Scholar 

  19. Azaria, A., Rabinovich, Z., Kraus, S., & Goldman, C. V. (2011). Strategic information disclosure to people with multiple alternatives. In Proceedings of the 26th AAAI Conference on artificial intelligence (AAAI), Maryland.

  20. Hajaj, C., Hazon, N., & Sarne, D. (2014). Ordering effects and belief adjustment in the use of comparison shopping agents. In AAAI-14 (pp. 930–936). Israel: Bar-Ilan University.

  21. Hajaj, C., Hazon, N., Sarne, D., & Elmalech, A. (2013). Search more, disclose less. In Proceedings of the twenty-seventh AAAI conference on artificial intelligence (pp. 401–408), Bellevue.

  22. Elmalech, A., Sarne, D., Rosenfeld, A., & Erez, E. S. (2015). When suboptimal rules. In Proceedings of AAAI-15, Menlo Park, CA.

  23. Wahlster, W., & Kobsa, A. (1989). User models in dialog systems. Berlin: Springer.

    Book  MATH  Google Scholar 

  24. Horvitz, E., Breese, J., Heckerman, D., Hovel, D., & Rommelse, K. (1998). The lumiere project: Bayesian user modeling for inferring the goals and needs of software users. In Proceedings of the fourteenth conference on uncertainty in artificial intelligence (pp. 256–265), Madison.

  25. Amir, O., & Gal, Y. K. (2013). Plan recognition and visualization in exploratory learning environments. ACM Transactions on Interactive Intelligent Systems (TiiS), 3(3), 16.

    Google Scholar 

  26. Kim, T., Hong, H., & Magerko, B. (2009). Coralog: Use-aware visualization connecting human micro-activities to environmental change. In CHI’09 Extended abstracts on human factors in computing systems (pp. 4303–4308). New York: ACM.

  27. Petersen, D., Steele, J., & Wilkerson, J. (2009). Wattbot: A residential electricity monitoring and feedback system. In CHI’09 extended abstracts on human factors in computing systems (pp. 2847–2852). New York: ACM.

  28. Pierce, J., Schiano, D. J., & Paulos, E. (2010). Home, habits, and energy: Examining domestic interactions and energy consumption. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1985–1994). New York: ACM.

  29. Froehlich, J., Findlater, L., & Landay, J. (2010). The design of eco-feedback technology. In SIGCHI conference on human factors in computing systems (pp. 1999–2008). New York: ACM.

  30. Fogg, B. J. (2002). Persuasive technology: Using computers to change what we think and do. Ubiquity, 2002, 5.

    Article  Google Scholar 

  31. Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of 36th annual symposium on foundations of computer science (FOCS), (pp. 322–331). Alamitos: IEEE Computer Society Press.

  32. Chabris, C. F., Laibson, D. I., & Schuldt, J. P. (2006). Intertemporal choice. The New Palgrave Dictionary of Economics, 2, 1–11.

    Google Scholar 

  33. Deaton, A., & Paxson, C. (1994). Intertemporal choice and inequality. The Journal of Political Economy, 102(3), 437–467.

    Article  Google Scholar 

  34. Lisman, J. E., & Idiart, M. A. P. (1995). Storage of 7 \(\pm \) 2 short-term memories in oscillatory subcycles. Science, 267, 1512–1515.

    Article  Google Scholar 

  35. Miller, G. A. (1956). The magical number seven plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97.

    Article  Google Scholar 

  36. Vermorel, Joanns, & Mohri, Mehryar. (2005). Multi-armed bandit algorithms and empirical evaluation. European conference on machine learning (pp. 437–448). New York: Springer.

    Google Scholar 

  37. Goldman, C. V., & Zilberstein, S. (2003). Optimizing information exchange in cooperative multi-agent systems. In Proceedings of the second international joint conference on autonomous agents and multiagent systems (pp. 137–144). Melbourne: ACM Press.

  38. Guestrin, C., Koller, D., & Parr, R. (2001). Multiagent planning with factored mdps. In NIPS (Vol. 1, pp. 1523–1530). Dordrecht: Kluwer Academic Publishers.

  39. Marecki, J., Koenig, S., & Tambe, M. (2007). A fast analytical algorithm for solving markov decision processes with real-valued resources. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 2536–2541), Hyderabad.

  40. Feng, Z., Dearden, R., Meuleau, N., & Washington, R. (2004). Dynamic programming for structured continuous markov decision problems. In The 20th conference on uncertainty in artificial intelligence (pp. 154–161). Orlando: AUAI Press.

  41. Ormoneit, D., & Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning, 49(2), 161–178.

    Article  MATH  Google Scholar 

  42. Keith, W. (1970). Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, 57(1), 97–109.

    Article  Google Scholar 

  43. Metropolis, N., & Ulam, S. (1949). The Monte carlo method. Journal of the American statistical association, 44(247), 335–341.

    Article  MATH  MathSciNet  Google Scholar 

  44. Gal, Y., Kraus, S., Gelfand, M., Khashan, H., & Salmon, E. (2011). An adaptive agent for negotiating with people in different cultures. ACM Transactions on Intelligent Systems and Technology (TIST), 3(1), 8.

    Google Scholar 

  45. Silver, D., & Veness, J. (2010). Monte-carlo planning in large pomdps. In Advances in neural information processing systems (pp. 2164–2172).

  46. Stone, P., & Kraus, S. (2010). To teach or not to teach? Decision making under uncertainty in ad hoc teams. In Proceedings of the 9th international conference on autonomous agents and multiagent systems (Vol. pp. 117–124). Toronto: International Foundation for Autonomous Agents and Multiagent Systems.

  47. Nguyen, T., Yang, R., Azaria, A., Kraus, S., & Tambe, M. (2013). Analyzing the effectiveness of adversary modeling in security games. In AAAI, New York.

  48. Azaria, A., Rabinovich, Z., Kraus, S., & Goldman, C. V. (2014). Strategic information disclosure to people with multiple alternatives. Transactions on Intelligent Systems and Technology (TIST), 5(4), 64–86.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amos Azaria.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Azaria, A., Gal, Y., Kraus, S. et al. Strategic advice provision in repeated human-agent interactions. Auton Agent Multi-Agent Syst 30, 4–29 (2016). https://doi.org/10.1007/s10458-015-9284-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-015-9284-6

Keywords

Navigation