On-line Q-learner Using Moving Prototypes


One of the most important breakthroughs in reinforcement learning has been the development of an off-policy control algorithm known as Q-learning. Unfortunately, in spite of its advantages, this method is only practical on a small number of problems. One reason is that a large number of training iterations is required to find a semi-optimal policy in even modest-sized problems. The other reason is that, often, the memory resources required by this method become too large.

At the heart of the Q-learning method is a function, which is called the Q-function. Modeling this function is what takes most of the memory resources used by this method. Several methods have been devised to tackle the Q-learning’s shortcomings, with relatively good success. However, even the most promising methods do a poor job at distributing the memory resources available to model the Q-function, which, in turn, limits the number of problems that can be solved by Q-learning. A new method called Moving Prototypes is proposed to alleviate this problem.

Interested in reading the entire thesis? (152 pages, 566,318 bytes, pdf)
Interested in viewing the Powerpoint presentation?


Last modified: July 12, 2005 -- © François Cellier