刘发林
邮编:
办公室电话:
邮箱:
DOI码:10.1109/TAC.2015.2497904
发表刊物:IEEE Trans. on Automatic Control
关键字:POMDPs, Constraints, Performance derivative, Simulation-based optimization, Observation-based policy.
摘要:In this technical note, constrained partially observable Markov decision processes with discrete state and action spaces under the average reward criterion are studied from a sensitivity point of view. By analyzing the derivatives of performance criteria, we develop a simulation-based optimization algorithm to find the optimal observation-based policy on the basis of a single sample path. This algorithm does not need any overly strict assumption and can be applied to the general ergodic Markov systems with the imperfect state information. The performance is proved to converge to the optimum with probability 1. One numerical example is provided to illustrate the applicability of the algorithm.
合写作者:Hongsheng Xi,Xiaodong Wang,Falin Liu
第一作者:Xiaofeng Jiang (姜晓枫)
论文类型:期刊论文
学科门类:工学
文献类型:J
卷号:61
期号:10
页面范围:3070-3075
是否译文:否
发表时间:2016-10-01
收录刊物:SCI、EI