刘发林
邮编:
办公室电话:
邮箱:
DOI码:10.1109/TAC.2017.2702203
发表刊物:IEEE Trans. on Automatic Control
关键字:Centralized optimization, decentralized partially observable Markov decision process (Dec-POMDP), large-scale system, sensitivity analysis, stochastic approximation.
摘要:In this paper, the decentralized partially observable Markov decision process (Dec-POMDP) systems with discrete state and action spaces are studied from a gradient point of view. Dec-POMDPs have recently emerged as a promising approach to optimizing multiagent decision making in the partially observable stochastic environment. However, the decentralized nature of the Dec-POMDP framework results in a lack of shared belief state, which makes the decision maker impossible to estimate the system state based on local information. In contrast to the belief-based policy, this paper focuses on optimizing the decentralized observationbased policy, which is easily to be applied and does not have the sharing problem. By analyzing the gradient of the objective function, we have developed a centralized stochastic gradient policy iteration algorithm to find the optimal policy on the basis of gradient estimates from a single sample path. This algorithm does not need any specific assumption and can be applied to most practical Dec-POMDP problems. One numerical example is provided to demonstrate the effectiveness of the algorithm.
合写作者:Xiaodong Wang,Hongsheng Xi,Falin Liu
第一作者:Xiaofeng Jiang (姜晓枫)
学科门类:工学
文献类型:J
卷号:62
期号:11
页面范围:6032-6038
是否译文:否
发表时间:2017-11-01
收录刊物:SCI、EI