Please use this identifier to cite or link to this item:

Hidden-mode Markov decision processes

Authors Choi, Samuel P.M.
Yeung, Dit-Yan View this author's profile
Zhang, Nevin Lianwen View this author's profile
Issue Date 1999
Source Proceedings of the 16th international joint conference on artificial intelligence (IJCAI-99), workshop on neural symbolic, and reinforcement methods for sequence learning, Stockholm, Sweden , 1999, p. 9-14
Summary Traditional reinforcement learning (RL) assumes that environment dynamics do not change over time (i.e., stationary). This assumption, however, is not realistic in many real-world applications. In this paper, a formal model for an interesting subclass of nonstationary environments is proposed. The environment model, called hidden-mode Markov decision process (HM-MDP), assumes that environmental changes are always confined to a small number of hidden modes. A mode basically indexes a Markov decision process (MDP) and evolves with time according to a Markov chain. HM-MDP is a special case of partially observable Markov decision processes (POMDP). Nevertheless, modeling an HM-MDP environment via the more general POMDP model unnecessarily increases the problem complexity. In this paper the conversion from the former to the latter is dicussed. Learning a model of HM-MDP is the first step of two steps for nonstationary model-based RL to take place. This paper shows how model learning can be achieved by using a variant of the Baum-Welch algorithm. Compared with the POMDP approach, empirical results reveal that the HM-MDP approach significantly reduces computational time as well as the required data.
Language English
Format Conference paper
Files in this item:
File Description Size Format
ijcaiwksp99.pdf 235889 B Adobe PDF