Login 中文

Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion

Hits:

  • DOI number:10.1109/TAC.2017.2702203

  • Journal:IEEE Trans. on Automatic Control

  • Key Words:Centralized optimization, decentralized partially observable Markov decision process (Dec-POMDP), large-scale system, sensitivity analysis, stochastic approximation.

  • Abstract:In this paper, the decentralized partially observable Markov decision process (Dec-POMDP) systems with discrete state and action spaces are studied from a gradient point of view. Dec-POMDPs have recently emerged as a promising approach to optimizing multiagent decision making in the partially observable stochastic environment. However, the decentralized nature of the Dec-POMDP framework results in a lack of shared belief state, which makes the decision maker impossible to estimate the system state based on local information. In contrast to the belief-based policy, this paper focuses on optimizing the decentralized observationbased policy, which is easily to be applied and does not have the sharing problem. By analyzing the gradient of the objective function, we have developed a centralized stochastic gradient policy iteration algorithm to find the optimal policy on the basis of gradient estimates from a single sample path. This algorithm does not need any specific assumption and can be applied to most practical Dec-POMDP problems. One numerical example is provided to demonstrate the effectiveness of the algorithm.

  • First Author:Xiaofeng Jiang (姜晓枫)

  • Co-author:Xiaodong Wang,Hongsheng Xi,Falin Liu

  • Discipline:Engineering

  • Document Type:J

  • Volume:62

  • Issue:11

  • Page Number:6032-6038

  • Translation or Not:no

  • Date of Publication:2017-11-01

  • Included Journals:SCI、EI


  • ZipCode:

  • OfficePhone:

  • Email:

Copyright © 2013 University of Science and Technology of China. Click:
  MOBILE Version

The Last Update Time:..