人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
非定常多腕バンディットゲームと集合知効果
吉田 俊介久門 正人守 真太郎
著者情報
ジャーナル フリー 早期公開

論文ID: 30-6_JWEIN-B

詳細
抄録

We define the swarm intelligence effect and obtain the condition for the emergence of it in an interactive game of restless multi-armed bandit where a player competes with multiple agents. Each arm in the bandit has a payoff which change with probability pc per round. Agents and a player choose one from three options: (1) Exploit (exploiting a good arm), (2) Innovate (asocial exploring for good arms), and (3) Observe (social exploring for good arms). Each agent has two parameters (c,pobs) to specify the decision: (i) c, the threshold value for Exploit. If the agent knows only arms whose payoffs are less than c, he chooses to explore. (ii)pobs, the probability for Observe when the agent explores. The parameters (c,pobs) of the agents are uniformly distributed. We introduce a scope nI for searching good arms in Innovate to control its cost. We determine optimal strategies of player using the complete knowledge about the bandit and the information of exploited arms by agents. We show which social or asocial exploring is optimal in (pc,nI) space. We conduct a laboratory experiment (67 subjects). If (pc,nI) is chosen so that social learning is far optimal than asocial learning, we observe the swarm intelligence effect. If (pc,nI) is in the region where asocial learning is optimal or comparable with social learning, we do not observe the effect.

著者関連情報
© 人工知能学会 2015
前の記事 次の記事
feedback
Top