HKUST Institutional Repository >
Computer Science and Engineering >
CSE Conference Papers >
Please use this identifier to cite or link to this item:
|Title: ||Hidden-mode Markov decision processes|
|Authors: ||Choi, Samuel P. M.|
Zhang, Nevin Lianwen
|Keywords: ||Reinforcement learning|
Hidden-mode Markov decision processes
Markov decision processes
Partially observable Markov decision processes
|Issue Date: ||1999 |
|Citation: ||Proceedings of the 16th international joint conference on artificial intelligence (IJCAI-99), workshop on neural symbolic, and reinforcement methods for sequence learning, Stockholm, Sweden, 1999, p. 9-14|
|Abstract: ||Traditional reinforcement learning (RL) assumes that environment dynamics do not change over time (i.e., stationary). This assumption, however, is not realistic in many real-world applications. In this paper, a formal model for an interesting subclass of nonstationary environments is proposed. The environment model, called hidden-mode Markov decision process (HM-MDP), assumes that environmental changes are always confined to a small number of hidden modes. A mode basically indexes a Markov decision process (MDP) and evolves with time according to a Markov chain.
HM-MDP is a special case of partially observable Markov decision processes (POMDP). Nevertheless, modeling an HM-MDP environment via the more general POMDP model unnecessarily increases the problem complexity. In this paper the conversion from the former to the latter is dicussed.
Learning a model of HM-MDP is the first step of two steps for nonstationary model-based RL to take place. This paper shows how model learning can be achieved by using a variant of the Baum-Welch algorithm. Compared with the POMDP approach, empirical results reveal that the HM-MDP approach significantly reduces computational time as well as the required data.|
|Appears in Collections:||CSE Conference Papers|
Files in This Item:
All items in this Repository are protected by copyright, with all rights reserved.