中国科学技术大学刘发林--中文主页-- Finding Optimal Observation-Based Policies for Constrained POMDPs Under the Expected Average Reward Criterion

刘发林

博士生导师

硕士生导师

教师姓名：刘发林

教师英文名称：LIU Falin

电子邮箱：

学历：博士研究生毕业

联系方式：0551-63601922

学位：工学博士学位

职称：研究员

毕业院校：中国科学技术大学

所属院系：信息科学技术学院

学科：电子科学与技术信息与通信工程

同专业博导同专业硕导

其他联系方式

邮编：

办公室电话：

邮箱：

论文成果

当前位置: 中文主页 > 科学研究 > 论文成果

Finding Optimal Observation-Based Policies for Constrained POMDPs Under the Expected Average Reward Criterion

发布时间：2022-07-04 点击次数：

DOI码：10.1109/TAC.2015.2497904

发表刊物：IEEE Trans. on Automatic Control

关键字：POMDPs, Constraints, Performance derivative, Simulation-based optimization, Observation-based policy.

摘要：In this technical note, constrained partially observable Markov decision processes with discrete state and action spaces under the average reward criterion are studied from a sensitivity point of view. By analyzing the derivatives of performance criteria, we develop a simulation-based optimization algorithm to find the optimal observation-based policy on the basis of a single sample path. This algorithm does not need any overly strict assumption and can be applied to the general ergodic Markov systems with the imperfect state information. The performance is proved to converge to the optimum with probability 1. One numerical example is provided to illustrate the applicability of the algorithm.

第一作者：Xiaofeng Jiang (姜晓枫)

合写作者：Hongsheng Xi,Xiaodong Wang,Falin Liu

论文类型：期刊论文

学科门类：工学

文献类型：J

卷号：61

期号：10

页面范围：3070-3075

是否译文：否

发表时间：2016-10-01

收录刊物：SCI、EI

上一条：Gridless Compressive Sensing Method for Line Spectral Estimation from 1-Bit Measurements 下一条：Robust 1-bit Compressive Sensing via Variational Bayesian Algorithm