中国科学技术大学刘发林--中文主页-- Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion

刘发林

博士生导师

硕士生导师

教师姓名：刘发林

教师英文名称：LIU Falin

电子邮箱：

学历：博士研究生毕业

联系方式：0551-63601922

学位：工学博士学位

职称：研究员

毕业院校：中国科学技术大学

所属院系：信息科学技术学院

学科：电子科学与技术信息与通信工程

同专业博导同专业硕导

其他联系方式

邮编：

办公室电话：

邮箱：

论文成果

当前位置: 中文主页 > 科学研究 > 论文成果

Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion

发布时间：2022-07-04 点击次数：

DOI码：10.1109/TAC.2017.2702203

发表刊物：IEEE Trans. on Automatic Control

关键字：Centralized optimization, decentralized partially observable Markov decision process (Dec-POMDP), large-scale system, sensitivity analysis, stochastic approximation.

摘要：In this paper, the decentralized partially observable Markov decision process (Dec-POMDP) systems with discrete state and action spaces are studied from a gradient point of view. Dec-POMDPs have recently emerged as a promising approach to optimizing multiagent decision making in the partially observable stochastic environment. However, the decentralized nature of the Dec-POMDP framework results in a lack of shared belief state, which makes the decision maker impossible to estimate the system state based on local information. In contrast to the belief-based policy, this paper focuses on optimizing the decentralized observationbased policy, which is easily to be applied and does not have the sharing problem. By analyzing the gradient of the objective function, we have developed a centralized stochastic gradient policy iteration algorithm to find the optimal policy on the basis of gradient estimates from a single sample path. This algorithm does not need any specific assumption and can be applied to most practical Dec-POMDP problems. One numerical example is provided to demonstrate the effectiveness of the algorithm.

第一作者：Xiaofeng Jiang (姜晓枫)

合写作者：Xiaodong Wang,Hongsheng Xi,Falin Liu

学科门类：工学

文献类型：J

卷号：62

期号：11

页面范围：6032-6038

是否译文：否

发表时间：2017-11-01

收录刊物：SCI、EI

上一条：A Novel Dual-Band Controllable Bandpass Filter Based on Fan-Shaped Substrate Integrated Waveguide 下一条：One-bit in-phase observation for direct learning-based digital predistortion with modified frequency-domain delay estimation and alignment