Abstract—Markov games and reinforcement learning algorithms are applied successfully in multi-agent learning systems such as Minimax-Q. Because of the interdependence between agents, it’s time consuming to find the optimal policy when agents learning concurrently. Some algorithms accelerate convergences through spatial or action generalization, which requires domain-dependent prior knowledge. In order to improve learning efficiency directly, the opponent modelling Q(λ) algorithm is proposed which combines fictitious play in game theory and eligibility trace in reinforcement learning. A series of empirical evaluations were conducted in the classical soccer domain. Compared with several other algorithms, it is proved that the algorithm contributed in this paper significantly enhances the learning performance of multi-agent systems.
Index Terms—Opponent modelling, markov Games, multi-agent, reinforcement learning.
The authors are with the College of Artificial Intelligence, National University of Defense Technology, Hunan, Changsha 410073 China. (e-mail: nudtchenhao15a@163.com, nudtjHuang@hotmail.com, fj_gjx@qq.com).
[PDF]
Cite: Hao Chen, Jian Huang, and Jianxing Gong, "Opponent Modelling with Eligibility Trace for Multi-agent Reinforcement Learning," International Journal of Modeling and Optimization vol. 9, no. 3, pp. 140-145, 2019.