skip to main content
research-article

Achieving Socially Optimal Outcomes in Multiagent Systems with Reinforcement Social Learning

Published:01 September 2013Publication History
Skip Abstract Section

Abstract

In multiagent systems, social optimality is a desirable goal to achieve in terms of maximizing the global efficiency of the system. We study the problem of coordinating on socially optimal outcomes among a population of agents, in which each agent randomly interacts with another agent from the population each round. Previous work [Hales and Edmonds 2003; Matlock and Sen 2007, 2009] mainly resorts to modifying the interaction protocol from random interaction to tag-based interactions and only focus on the case of symmetric games. Besides, in previous work the agents’ decision making processes are usually based on evolutionary learning, which usually results in high communication cost and high deviation on the coordination rate. To solve these problems, we propose an alternative social learning framework with two major contributions as follows. First, we introduce the observation mechanism to reduce the amount of communication required among agents. Second, we propose that the agents’ learning strategies should be based on reinforcement learning technique instead of evolutionary learning. Each agent explicitly keeps the record of its current state in its learning strategy, and learn its optimal policy for each state independently. In this way, the learning performance is much more stable and also it is suitable for both symmetric and asymmetric games. The performance of this social learning framework is extensively evaluated under the testbed of two-player general-sum games comparing with previous work [Hao and Leung 2011; Matlock and Sen 2007]. The influences of different factors on the learning performance of the social learning framework are investigated as well.

References

  1. Allison, P. D. 1992. The cultural evolution of beneficent norms. Social Forces. 279--301.Google ScholarGoogle Scholar
  2. Bowling, M. H. and Veloso, M. M. 2003. Multiagent learning using a variable learning rate. Artif. Intell. 136, 215--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brafman, R. I. and Tennenholtz, M. 2004. Efficient learning equilibrium. Artif. Intell. 159, 27--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chao, I., Ardaiz, O., and Sanguesa, R. 2008. Tag mechanisms evaluated for coordination in open multi-agent systems. In Proceedings of the 8th International Workshop on Engineering Societies in the Agents World. 254--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Claus, C. and Boutilier, C. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of AAAI’98. 746--752. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Conitzer, V. and Sandholm, T. 2006. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In Proceedings of ICML’06. 83--90.Google ScholarGoogle Scholar
  7. Crandall, J. W. and Goodrich, M. A. 2005. Learning to teach and follow in repeated games. In Proceedings of the AAAI Workshop on Multiagent Learning.Google ScholarGoogle Scholar
  8. Greenwald, A. and Hall, K. 2003. Correlated Q-Learning. In Proceedings of ICML’03. 242--249.Google ScholarGoogle Scholar
  9. Hales, D. 2000. Cooperation without space or memory-tag, groups and the Prisoner’s Dilemma. In Multi-Agent-Based Simulation, Lecture Notes in Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hales, D. and Edmonds, B. 2003. Evolving social rationality for MAS using “Tags”. In Proceedings of AA-MAS’03. 497--503. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hao, J. Y. and Leung, H. F. 2010. Strategy and fairness in repeated two-agent interaction. In Proceedings of ICTAI’10. 3--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hao, J. Y. and Leung, H. F. 2011. Learning to achieve social rationality using tag mechanism in repeated interactions. In Proceedings of ICTAI’11. 148--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hao, J. Y. and Leung, H. F. 2012. Learning to achieve socially optimal solutions in general-sum games. In Proceedings of the PRICAI’12. 88--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hoen, P. J., Tuyls, K.l, Panait, L., Luke, S., and Poutr, J. A. L. 2005. An overview of cooperative and competitive multiagent learning. In Proceedings of the 1st International Workshop on Learning and Adaption in Multi-Agent Systems. 1--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hogg, L. M. and Jennings, N. R. 1997. Socially rational agents. In Proceedings of the AAAI Fall Symposium on Socially Intelligent Agents. 61--63.Google ScholarGoogle Scholar
  16. Hogg, L. M. J. and Jennings, N. R. 2001. Socially intelligent reasoning for autonomous agents. IEEE Trans. Syst. Man. Cybernetics, Part A: Syst. Humans. 381--393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Holland, J. H., Holyoak, K., Nisbett, R., and Thagard, P. 1986. Induction: Processes of Inferences, Learning, and Discovery. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Howley, E. and O’Riordan, C. 2005. The emergence of cooperation among agents using simple fixed bias tagging. In Proceedings of the IEEE Congress on Evolutionary Computation.Google ScholarGoogle ScholarCross RefCross Ref
  19. Hu, J. and Wellman, M. 1998. Multiagent reinforcement learning: theoretical framework and an algorithm. In Proceedings of the ICML’98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kapetanakis, S. and Kudenko, D. 2002. Reinforcement learning of coordination in cooperative multi-agent systems. In Proceedings of the AAAI’02. 326--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Littman, M. 1994. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of ICML’94. 322--328.Google ScholarGoogle ScholarCross RefCross Ref
  22. Matlock, M. and Sen, S. 2005. The success and failure of tag-mediated evolution of cooperation. In Proceedings of the 1st International Workshop on Learning and Adaption in Multi-Agent Systems. Springer, 155--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Matlock, M. and Sen, S. 2007. Effective tag mechanisms for evolving coordination. In Proceedings of AAMAS’07. 1340--1347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Matlock, M. and Sen, S. 2009. Effective tag mechanisms for evolving coperation. In Proceedings of AAMAS’09. 489--496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Maynard, J. S. 1982. Evolution and The Theory of Games. Cambridge University Press, Cambridge, UK.Google ScholarGoogle Scholar
  26. Nowak, M. and Sigmund, K. 1993. A strategy of winstay, lose-shit that outperforms tit-for-tat in the prisoner’s dilemma game. Nature, 56--58.Google ScholarGoogle Scholar
  27. Osborne, M. J. and Rubinstein, A. 1994. A Course in Game Theory. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  28. Panait, L. and Luke, S. 2005. Cooperative multi-agent learning: The state of the art. In Proceedings of AAMAS. 387--434.Google ScholarGoogle Scholar
  29. Pitt, J., Schaumeier, J., Busquets, D., and Macbeth, S. 2012. Self-organising common-pool resource allocation and canons of distributive justice. In Proceedings of SASO’12. 119--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Riol, R. and Cohen, M. D. 2001. Cooperation withour reciprocity. Nature, 441--443.Google ScholarGoogle Scholar
  31. Sen, S. and Airiau, S. 2007. Emergence of norms through social learning. In Proceedings of IJCAI’07. 1507--1512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Verbeeck, K., Nowé, A., Parent, J., and Tuyls, K. 2006. Exploring selfish reinforcement learning in repeated games with stochastic rewards. In Proceedings of AAMAS’06. 239--269.Google ScholarGoogle Scholar
  33. Villatoro, D., Sabater-Mir, J., and Sen, S. 2011. Social instruments for robust convention emergence. In Proceedings of IJCAI’11. 420--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Wakano, J. Y. and Yamamura, N. 2001. A simple learning strategy that realizes robust cooperation better than Pavlov in iterated Prisoner’s Dilemma. J. Ethol. 19, 9--15.Google ScholarGoogle ScholarCross RefCross Ref
  35. Watkins, C. J. C. H. and Dayan, P. D. 1992. Q-learning. Mach. Learn. 279--292. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Achieving Socially Optimal Outcomes in Multiagent Systems with Reinforcement Social Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Autonomous and Adaptive Systems
      ACM Transactions on Autonomous and Adaptive Systems  Volume 8, Issue 3
      September 2013
      110 pages
      ISSN:1556-4665
      EISSN:1556-4703
      DOI:10.1145/2518017
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 September 2013
      • Accepted: 1 August 2013
      • Revised: 1 May 2013
      • Received: 1 January 2013
      Published in taas Volume 8, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader