访问量:   最后更新时间:--

刘发林

博士生导师
硕士生导师
教师姓名:刘发林
教师英文名称:LIU Falin
电子邮箱:
学历:博士研究生毕业
联系方式:0551-63601922
学位:工学博士学位
职称:研究员
毕业院校:中国科学技术大学
所属院系:信息科学技术学院
学科:电子科学与技术    信息与通信工程    
其他联系方式

邮编:

办公室电话:

邮箱:

论文成果
Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion
发布时间:2022-07-04    点击次数:

DOI码:10.1109/TAC.2017.2702203

发表刊物:IEEE Trans. on Automatic Control

关键字:Centralized optimization, decentralized partially observable Markov decision process (Dec-POMDP), large-scale system, sensitivity analysis, stochastic approximation.

摘要:In this paper, the decentralized partially observable Markov decision process (Dec-POMDP) systems with discrete state and action spaces are studied from a gradient point of view. Dec-POMDPs have recently emerged as a promising approach to optimizing multiagent decision making in the partially observable stochastic environment. However, the decentralized nature of the Dec-POMDP framework results in a lack of shared belief state, which makes the decision maker impossible to estimate the system state based on local information. In contrast to the belief-based policy, this paper focuses on optimizing the decentralized observationbased policy, which is easily to be applied and does not have the sharing problem. By analyzing the gradient of the objective function, we have developed a centralized stochastic gradient policy iteration algorithm to find the optimal policy on the basis of gradient estimates from a single sample path. This algorithm does not need any specific assumption and can be applied to most practical Dec-POMDP problems. One numerical example is provided to demonstrate the effectiveness of the algorithm.

合写作者:Xiaodong Wang,Hongsheng Xi,Falin Liu

第一作者:Xiaofeng Jiang (姜晓枫)

学科门类:工学

文献类型:J

卷号:62

期号:11

页面范围:6032-6038

是否译文:

发表时间:2017-11-01

收录刊物:SCI、EI