面向对手建模的意图识别方法综述

doi:10.11959/j.issn.2096-109x.2021052

摘要/Abstract

摘要：

首先介绍了对手建模的几种不同的类型，引出行为建模中的意图识别问题；随后针对意图识别的过程、分类、主要研究方法、研究展望以及实际应用进行了归纳分析，总结并讨论了相关领域取得的最新研究成果；最后指出意图识别目前存在的不足以及未来的发展方向。

关键词: 对手建模, 意图识别, 目标识别, 计划识别, 目标识别设计, 计划识别设计

Abstract:

Several different methods of opponent modeling were introduced, leading to the problem of intention recognition in behavior modeling.Then, the process, classification, main methods, research prospects and practical applications of intention recognition were analyzed inductively, the latest research in related fields were summarized.Finally, some shortcomings of the current intention recognition and design methods were pointed out and some new insights for the future research were presented.

Key words: opponent modeling, intention recognition, goal recognition, plan recognition, goal recognition design, plan recognition design

中图分类号:

TP18

高巍, 罗俊仁, 袁唯淋, 张万鹏. 面向对手建模的意图识别方法综述[J]. 网络与信息安全学报, 2021, 7(4): 86-100.

Wei GAO, Junren LUO, Weilin YUAN, Wanpeng ZHANG. Survey of intention recognition for opponent modeling[J]. Chinese Journal of Network and Information Security, 2021, 7(4): 86-100.

图/表 6

图1

图2

图3

表1

表2

表3

参考文献 92

[1]	SUKTHANKAR G . Plan,activity,and intent recognition:theory and practice[R]. 2014.
[2]	CHAKRABORTI T , KAMBHAMPATI S , SCHEUTZ M ,et al. AI challenges in human-robot cognitive teaming[J]. arXiv preprint arXiv:1707.04775, 2017.
[3]	ALBRECHT S V , STONE P . Autonomous agents modelling other agents:a comprehensive survey and open problems[J]. Artificial Intelligence, 2018,258: 66-95.
[4]	HEINZE C . Modelling intention recognition for intelligent agent systems[R]. 2004.
[5]	LE GUILLARME N . A game-theoretic planning framework for intentional threat assessment[D]. Thèse de doctorat:Université de Caen, 2016.
[6]	BIGELOW D . Intent recognition in multi-agent domains[M]. University of Nevada,Reno, 2013.
[7]	STROUSE D J , KLEIMAN-WEINER M , TENENBAUM J ,et al. Learning to share and hide intentions using information regularization[C]// Advances in Neural Information Processing Systems. 2018: 10249-10259.
[8]	CHAKRABORTI T , KULKARNI A , SREEDHARAN S ,et al. Explicability legibility predictability transparency privacy security the emerging landscape of interpretable agent behavior[C]// Proceedings of the International Conference on Automated Planning and Scheduling. 2019: 86-96.
[9]	KEREN S , GAL A , KARPAS E . Privacy preserving plans in partially observable environments[C]// IJCAI. 2016: 3170-3176.
[10]	WRIGHT J R . Modeling human behavior in strategic settings[D]. Columbia:University of British Columbia, 2016.
[11]	PLONSKY O , APEL R , ERT E ,et al. Predicting human decisions with behavioral theories and machine learning[J]. arXiv preprint arXiv:1904.06866, 2019.
[12]	BORGHETTI B J . Opponent modeling in interesting adversarial environments[M]. Minnesota: University of Minnesota, 2008.
[13]	BROWNE C B , POWLEY E , WHITEHOUSE D ,et al. A survey of monte carlo tree search methods[J]. IEEE Transactions on Computational Intelligence and AI in Games, 2012,4(1): 1-43.
[14]	CHAKRABORTY D , STONE P . Multiagent learning in the presence of memory-bounded agents[J]. Autonomous Agents and Multi-Agent Systems, 2014,28(2): 182-213.
[15]	KOLODNER J . Case-based reasoning[M]. Morgan Kaufmann, 2014.
[16]	CARMEL D , MARKOVITCH S . Learning models of intelligent agents[C]// AAAI/IAAI. 1996: 62-67.
[17]	BAARSLAG T , HENDRIKX M J C , HINDRIKS K V ,et al. Learning about the opponent in automated bilateral negotiation:a comprehensive survey of opponent modeling techniques[J]. Autonomous Agents and Multi-Agent Systems, 2016,30(5): 849-898.
[18]	BARRETT S , STONE P , KRAUS S ,et al. Teamwork with limited knowledge of teammates[C]// Twenty-Seventh AAAI Conference on Artificial Intelligence. 2013.
[19]	ALBRECHT S V , CRANDALL J W , RAMAMOORTHY S . An empirical study on the practical impact of prior beliefs over policy types[C]// Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.
[20]	ALBRECHT S V , RAMAMOORTHY S . On convergence and optimality of best-response learning with policy types in multiagent systems[J]. arXiv preprint arXiv:1907.06995, 2019.
[21]	SCHADD F , BAKKES S , SPRONCK P . Opponent modeling in real-time strategy games[C]// GAMEON. 2007: 61-70.
[22]	WEN Y , YANG Y , LU R ,et al. Multi-agent generalized recursive reasoning[J]. arXiv preprint arXiv:1901.09216, 2019.
[23]	WEN Y , YANG Y , LUO R ,et al. Probabilistic recursive reasoning for multi-agent reinforcement learning[J]. arXiv preprint arXiv:1901.09207, 2019.
[24]	DOSHI P , ZENG Y , CHEN Q . Graphical models for interactive POMDPs:representations and solutions[J]. Autonomous Agents and Multi-Agent Systems, 2009,18(3): 376.
[25]	TORKAMAN A , SAFABAKHSH R . Robust opponent modeling in real-time strategy games using bayesian networks[J]. Journal of AI and Data Mining, 2019,7(1): 149-159.
[26]	MAO W , GRATCH J , LI X . Probabilistic plan inference for group behavior prediction[J]. IEEE Intelligent Systems, 2012,27(4): 27-36.
[27]	HAUSKNECHT M , MUPPARAJU P , SUBRAMANIAN S ,et al. Half field offense:an environment for multiagent learning and ad hoc teamwork[C]// AAMAS Adaptive Learning Agents (ALA) Workshop. 2016.
[28]	?O?I? A . Learning models of behavior from demonstration and through interaction[D]. Technische Universit?t, 2018.
[29]	HERNANDEZ-LEAL P , ZHAN Y , TAYLOR M E ,et al. Efficiently detecting switches against non-stationary opponents[J]. Autonomous Agents and Multi-Agent Systems, 2017,31(4): 767-789.
[30]	ALBRECHT S V , RAMAMOORTHY S . Are you doing what i think you are doing? criticising uncertain agent models[J]. arXiv preprint arXiv:1907.01912, 2019.
[31]	WANG Z , BOULARIAS A , MüLLING K ,, et al . Balancing safety and exploitability in opponent modeling[C]// Twenty-Fifth AAAI Conference on Artificial Intelligence. 2011.
[32]	STANESCU A M . Outcome prediction and hierarchical models in real-time strategy games[R]. 2019.
[33]	MOURAD M , AREF M , ABD-ELAZIZ M , . Opponent models pre-processing in real-time strategy games[J]. International Journal of Intelligent Computing and Information Sciences, 2016,16(3): 37-45.
[34]	SUKTHANKAR G R . Activity recognition for agent teams[R]. 2007.
[35]	FREEDMAN R G , ZILBERSTEIN S . A unifying perspective of plan,activity,and intent recognition[C]// Proceedings of the AAAI Workshops:Plan,Activity,Internet Recognition. 2019: 1-8.
[36]	RAMíREZ M , GEFFNER H . Plan recognition as planning[C]// Twenty-First International Joint Conference on Artificial Intelligence. 2009.
[37]	SOHRABI S , RIABOV A V , UDREA O . Plan recognition as planning revisited[C]// IJCAI. 2016: 3258-3264.
[38]	PEREIRA R F , OREN N , MENEGUZZI F . Landmark-based approaches for goal recognition as planning[J]. arXiv preprint arXiv:1904.11739, 2019.
[39]	AINETO D , JIMéNEZ S ,, ONAINDIA E , et al . Model recognition as planning[C]// Proceedings of the International Conference on Automated Planning and Scheduling. 2019: 13-21.
[40]	ANG S , CHAN H , JIANG A X ,et al. Game-theoretic goal recognition models with applications to security domains[C]// International Conference on Decision and Game Theory for Security. 2017: 256-272.
[41]	LE GUILLARME N , MOUADDIB A I , LEROUVREUR X ,et al. A generative game-theoretic framework for adversarial plan recognition[C]// JFPDA 2015. 2015.
[42]	LI J , REN T , SU H ,et al. Learn a robust policy in adversarial games via playing with an expert opponent[C]// Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019: 2096-2098.
[43]	?O?I? A . Learning models of behavior from demonstration and through interaction[D]. Technische Universit?t, 2018.
[44]	ZIEBART B D . Modeling purposeful adaptive behavior with the principle of maximum causal entropy[D]. Figshare, 2010.
[45]	TASTAN B , . Learning human motion models[C]// Eighth Artificial Intelligence and Interactive Digital Entertainment Conference. 2012.
[46]	GAURAV S , ZIEBART B . Discriminatively learning inverse optimal control models for predicting human intentions[C]// Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019: 1368-1376.
[47]	LI X , YANG W , ZHANG Z . A unified framework for regularized reinforcement learning[J]. arXiv preprint arXiv:1903.00725, 2019.
[48]	TIAN Z , WEN Y , GONG Z ,et al. A regularized opponent model with maximum entropy objective[J]. arXiv preprint arXiv:1905.08087, 2019.
[49]	MAYNARD M , DUHAMEL T , KABANZA F . Cost-based goal recognition meets deep learning[J]. arXiv preprint arXiv:1911.10074, 2019.
[50]	WOOKHEE M , YOUNG E H , ROWE J . Deep learning-based goal recognition in open-ended digital games[C]// Tenth AAAI Conference on Artificial Intelligence ＆ Interactive Digital Entertainment. 2014.
[51]	DUHAMEL T , MAYNARD M , KABANZA F . A transfer learning method for goal recognition exploiting cross-domain spatial fea-tures[J]. arXiv preprint arXiv:1911.10134, 2019.
[52]	THIBAULT D , MAYNARD M , KABANZA F . Imagination-augmented deep learning for goal recognition[J]. arXiv preprint arXiv:2003.09529v1, 2020.
[53]	BLAYLOCK N , ALLEN J . Fast hierarchical goal schema recognition[C]// Proceedings of the National Conference on Artificial Intelligence. 2006:796.
[54]	VERED M , KAMINKA G A . Heuristic online goal recognition in continuous domains[C]// International Joint Conference on Artificial Intelligence. 2017: 4447-4454.
[55]	VERED M , KAMINKA G A , BIHAM S . Online goal recognition through mirroring:Humans and agents[C]// The Fourth Annual Conference on Advances in Cognitive Systems. 2016.
[56]	VERED M , KAMINKA G A . Online recognition of navigation goals through goal mirroring[C]// Proceedings of the 16th Conference on Autonomous Agents and Multiagent Systems.International Foundation for Autonomous Agents and Multiagent Systems. 2017: 1748-1750.
[57]	VERED M , PEREIRA R F , MAGNAGUAGNO M C ,et al. Towards online goal recognition combining goal mirroring and landmarks[C]// AAMAS. 2018: 2112-2114.
[58]	MASTERS P , SARDINA S . Cost-based goal recognition for the path-planning domain[C]// IJCAI. 2018: 5329-5333.
[59]	MASTERS P , SARDINA S . Goal recognition for rational and irrational agents[C]// Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems.International Foundation for Autonomous Agents and Multiagent Systems. 2019: 440-448.
[60]	HOFFMANN J , PORTEOUS J , SEBASTIA L . Ordered landmarks in planning[J]. Journal of Artificial Intelligence Research, 2004,22: 215-278.
[61]	SCHMIDT C , SRIDHARAN N , GOODSON J . The plan recognition problem:an intersection of psychology and artificial intelligence[J]. Artificial Intelligence, 1978,11: 45-83.
[62]	COHEN P R , PERRAULT C R , ALLEN J F . Beyond question answering[M]// Strategies for Natural Language Processing. Lawrence Erlbaum Associates, 1981.
[63]	PENTNEY W , POPESCU A ,, WANG S , KAUTZ H ,et al. Sensor-based understanding of daily life via large-scale use of common sense[C]// Proceedings of AAAI. 2006.
[64]	KEREN S , GAL A , KARPAS E . Goal recognition design[C]// Twenty-Fourth International Conference on Automated Planning and Scheduling. 2014.
[65]	SON T C , SABUNCU O , Schulz-Hanke C ,, et al . Solving goal recognition design using ASP[C]// Thirtieth AAAI Conference on Artificial Intelligence. 2016.
[66]	KEREN S , GAL A , KARPAS E ,et al. Goal recognition design for non-optimal agents[C]// National Conference on Artificial Intelligence. 2015: 3298-3304.
[67]	KEREN S , GAL A , KARPAS E . Goal recognition design with non-observable actions[C]// Thirtieth AAAI Conference on Artificial Intelligence. 2016.
[68]	KEREN S , GAL A , KARPAS E . Strong stubborn sets for efficient goal recognition design[C]// Twenty-Eighth International Conference on Automated Planning and Scheduling. 2018.
[69]	SARAH K , AVIGDOR G , EREZ K . Goal recognition design in deterministic environments[J]. Journal of Artificial Intelligence Research, 2019,65: 209-269.
[70]	KEREN S , KEREN S , GAL A ,et al. Equi-reward utility maximizing design in stochastic environments[C]// Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. 2017: 4353-4360.
[71]	WAYLLACE C , HOU P , YEOH W ,et al. Goal recognition design with stochastic agent action outcomes[C]// IJCAI. 2016.
[72]	WAYLLACE C , HOU P , YEOH W . New Metrics and Algorithms for Stochastic Goal Recognition Design Problems[C]// IJCAI. 2017: 4455-4462.
[73]	WAYLLACE C , KEREN S , YEOH W ,et al. Accounting for partial observability in stochastic goal recognition design:messing with the marauder’s map[C]// Proceedings of the 10th Workshop on Heuristics and Search for Domain-Independent Planning (HSDIP),Delft,The Netherlands. 2018: 33-41.
[74]	RICHARD B . Dynamic Programming[M]. Princeton University Press, 1957.
[75]	ROBERT T . Depth-first search and linear graph algorithms[J]. SIAM Journal on Computing, 1972,1(2): 146-160.
[76]	MIRSKY R , STERN R , GAL Y ,et al. Plan recognition design[C]// Workshops at the Thirty First AAAI Conference on Artificial Intelligence. 2017.
[77]	RAMIREZ M , GEFFNER H . Heuristics for planning,plan recognition and parsing[J]. arXiv preprint arXiv:1605.05807, 2016.
[78]	MIRSKY R . Goal and plan recognition design for plan libraries[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2019,10(2): 14.
[79]	PEREIRA R F , PEREIRA A G , MENEGUZZI F . Landmark-enhanced heuristics for goal recognition in incomplete domain models[C]// Proceedings of the International Conference on Automated Planning and Scheduling. 2019: 329-337.
[80]	PEREIRA R F . Goal recognition over imperfect domain models[J]. arXiv preprint arXiv:2005.05712, 2020.
[81]	AMATO C , BAISERO A . Active goal recognition[J]. arXiv preprint arXiv:1909.11173, 2019.
[82]	ZHANG T , . Solving large scale linear prediction problems using stochastic gradient descent algorithms[C]// Proceedings of the International Conference on Machine Learning (ICML). 2004. 919-926.
[83]	ZHUO H H . Recognizing multi-agent plans when action models and team plans are both incomplete[J]// ACM Transactions on Intelligent Systems and Technology, 2019,10(3): 1-24.
[84]	POZANCO A , MARTIN Y E , FERNANDEZ S ,et al. Counterplanning using Goal Recognition and Landmarks[C]// International Joint Conference on Artificial Intelligence. 2018: 4808-4814.
[85]	GADEPALLY V , GOODWIN J , KEPNER J ,et al. AI enabling technologies:a survey[J]. arXiv preprint arXiv:1905.03592, 2019.
[86]	CHAKRABORTI T , KULKARNI A , SREEDHARAN S ,et al. Explicability legibility predictability transparency privacy security the emerging landscape of interpretable agent behavior[C]// Proceedings of the International Conference on Automated Planning and Scheduling. 2019: 86-96.
[87]	SREEDHARAN S , KAMBHAMPATI S . Balancing explicability and explanation in human-aware planning[C]// 2017 AAAI Fall Symposium Series. 2017.
[88]	BANERJEE B , KRAEMER L , LYLE J ,et al. Multi-agent plan recognition:formalization and algorithms[C]// National Conference on Artificial Intelligence, 2010: 1059-1064.
[89]	ZHUO H H . Multiagent plan recognition from partially observed team traces[J]. Plan,Activity,and Intent Recognition, 2014: 227-249.
[90]	ARGENTA C , DOYLE J . Multi-agent plan recognition as planning (MAPRAP)[C]// International Conference on Agents and Artificial Intelligence. 2016: 141-14
[91]	ALEXANDER K , WILLIAM M.McEneaney . Adversarial reasoning:computational approaches to reading the opponent's mind[R]. 2006.
[92]	KEREN S , Gal A , KARPAS E . Goal recognition design-survey[C]// Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI-20). 2020.

目标识别模型	原理	方法	连续	离散	在线	离线
		R＆G		√		√
基于代价的目标识别	根据状态与目标状态之间的代价差	M＆S	√	√		√
		V＆K	√	√		√
基于计划度量的目标识别	根据计划执行过程进行分析方法	Landmark	√		√	√

问题	文献	环境		智能体		度量		度量方法
问题	文献	部分可观	完全可观	非最优计划	部分可观	WCD	ECD	行为移除	传感器精化	行为条件
	文献[64]		√			√		√
	文献[65]		√			√		√
	文献[66]		√	√		√		√
GRD	文献[67]	√		√		√		√	√
	文献[9]	√		√		√		√	√
	文献[68]	√		√		√		√	√	√
	文献[69]	√		√		√		√	√	√
	文献[70]		√		√	√			√
	文献[71]		√			√		√
S-GRD	文献[72]		√			√	√	√
	文献[73]	√				√		√

问题	输入	输出	度量	度量方法
GRD	STRIPS域	带有少量行动的STRIPS域	WCD	明确智能体的目标所需的最大观测数
PRD	计划库	带有少量行动的计划库	WCPD	明确智能体的计划所需的最大观测数
GRD-PL	计划库	带有少量行动的计划库	WCD	明确智能体的目标所需的最大观测数