Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion
Hits:
DOI number:10.1109/TAC.2017.2702203
Journal:IEEE Trans. on Automatic Control
Key Words:Centralized optimization, decentralized partially observable Markov decision process (Dec-POMDP), large-scale system, sensitivity analysis, stochastic approximation.
Abstract:In this paper, the decentralized partially observable Markov decision process (Dec-POMDP) systems with discrete state and action spaces are studied from a gradient point of view. Dec-POMDPs have recently emerged as a promising approach to optimizing multiagent decision making in the partially observable
stochastic environment. However, the decentralized nature of the
Dec-POMDP framework results in a lack of shared belief state,
which makes the decision maker impossible to estimate the system state based on local information. In contrast to the belief-based policy, this paper focuses on optimizing the decentralized observationbased policy, which is easily to be applied and does not have the sharing problem. By analyzing the gradient of the objective function, we have developed a centralized stochastic gradient policy iteration algorithm to find the optimal policy on the basis of gradient estimates from a single sample path. This algorithm does not need any specific assumption and can be applied to most practical Dec-POMDP problems. One numerical example is provided to demonstrate the effectiveness of the algorithm.
First Author:Xiaofeng Jiang (姜晓枫)
Co-author:Xiaodong Wang,Hongsheng Xi,Falin Liu
Discipline:Engineering
Document Type:J
Volume:62
Issue:11
Page Number:6032-6038
Translation or Not:no
Date of Publication:2017-11-01
Included Journals:SCI、EI
-
|
 ZipCode:4033d038a97fa8a1e181832fb7374e02602ee696c0157a059ba3dede124bef920ced6426ac54dc14fe958f2764201685f155445b71f34a1bdb26d49a8e19909d5f12885d72e4a9af17189d12b56d9797e98a5aea30fc139d96a35fa624a75258ef4cb0d7f98f359ba300538a65269993f6dbe7be389418af3015b379354515cd
 OfficePhone:2c29ce60609ab4b788169086b4fdd9f5ac7380dedf229d753ad43396eb7a2cb8bfb970ff40ec4e3713bfa5f9b3d834a0a1817580064c3a179f0121bca200f63a2be841b5c347fae2d9e69b17d45e95eddde746c74825639ad46c2a0bd9f332b7943cba144aafb10f50a4ac216698013ff0f2a2363f77d2643174e03877cb3388
 Email:94c91894ab8dbeeac6c04497f81ed0b1319b1cd5a2aca48587580e88dcbd4616c141776d545f1bc168128ddfeaf4269d525ed2e053a6f8663c63b991401b456f3fe5e523ceac5da91f0f560ac95ec756bd754f8b10464c1f206382846b3636fd5d03cbb2b856b9c4be2e6878ae07d2d43842b0ef6967cfe21a413d26e9813733
|