Finding Optimal Observation-Based Policies for Constrained POMDPs Under the Expected Average Reward Criterion
Hits:
DOI number:10.1109/TAC.2015.2497904
Journal:IEEE Trans. on Automatic Control
Key Words:POMDPs, Constraints, Performance derivative,
Simulation-based optimization, Observation-based policy.
Abstract:In this technical note, constrained partially observable Markov decision processes with discrete state and action
spaces under the average reward criterion are studied from
a sensitivity point of view. By analyzing the derivatives of
performance criteria, we develop a simulation-based optimization
algorithm to find the optimal observation-based policy on the
basis of a single sample path. This algorithm does not need
any overly strict assumption and can be applied to the general
ergodic Markov systems with the imperfect state information.
The performance is proved to converge to the optimum with
probability 1. One numerical example is provided to illustrate
the applicability of the algorithm.
First Author:Xiaofeng Jiang (姜晓枫)
Co-author:Hongsheng Xi,Xiaodong Wang,Falin Liu
Indexed by:Journal paper
Discipline:Engineering
Document Type:J
Volume:61
Issue:10
Page Number:3070-3075
Translation or Not:no
Date of Publication:2016-10-01
Included Journals:SCI、EI
-
|
 ZipCode:4033d038a97fa8a1e181832fb7374e02602ee696c0157a059ba3dede124bef920ced6426ac54dc14fe958f2764201685f155445b71f34a1bdb26d49a8e19909d5f12885d72e4a9af17189d12b56d9797e98a5aea30fc139d96a35fa624a75258ef4cb0d7f98f359ba300538a65269993f6dbe7be389418af3015b379354515cd
 OfficePhone:2c29ce60609ab4b788169086b4fdd9f5ac7380dedf229d753ad43396eb7a2cb8bfb970ff40ec4e3713bfa5f9b3d834a0a1817580064c3a179f0121bca200f63a2be841b5c347fae2d9e69b17d45e95eddde746c74825639ad46c2a0bd9f332b7943cba144aafb10f50a4ac216698013ff0f2a2363f77d2643174e03877cb3388
 Email:94c91894ab8dbeeac6c04497f81ed0b1319b1cd5a2aca48587580e88dcbd4616c141776d545f1bc168128ddfeaf4269d525ed2e053a6f8663c63b991401b456f3fe5e523ceac5da91f0f560ac95ec756bd754f8b10464c1f206382846b3636fd5d03cbb2b856b9c4be2e6878ae07d2d43842b0ef6967cfe21a413d26e9813733
|