Login 中文

Finding Optimal Observation-Based Policies for Constrained POMDPs Under the Expected Average Reward Criterion

Hits:

  • DOI number:10.1109/TAC.2015.2497904

  • Journal:IEEE Trans. on Automatic Control

  • Key Words:POMDPs, Constraints, Performance derivative, Simulation-based optimization, Observation-based policy.

  • Abstract:In this technical note, constrained partially observable Markov decision processes with discrete state and action
    spaces under the average reward criterion are studied from
    a sensitivity point of view. By analyzing the derivatives of
    performance criteria, we develop a simulation-based optimization
    algorithm to find the optimal observation-based policy on the
    basis of a single sample path. This algorithm does not need
    any overly strict assumption and can be applied to the general
    ergodic Markov systems with the imperfect state information.
    The performance is proved to converge to the optimum with
    probability 1. One numerical example is provided to illustrate
    the applicability of the algorithm.

  • First Author:Xiaofeng Jiang (姜晓枫)

  • Co-author:Hongsheng Xi,Xiaodong Wang,Falin Liu

  • Indexed by:Journal paper

  • Discipline:Engineering

  • Document Type:J

  • Volume:61

  • Issue:10

  • Page Number:3070-3075

  • Date of Publication:2016-10-01

  • Included Journals:SCI、EI

  • ZipCode:

  • OfficePhone:

  • Email:

Copyright © 2013 University of Science and Technology of China. Click:
  MOBILE Version

The Last Update Time:..