# On-line Q-learner Using Moving Prototypes

## Abstract

One of the most important breakthroughs in reinforcement learning has been the
development of an off-policy control algorithm known as *Q-learning*.
Unfortunately, in spite of its advantages, this method is only practical on a
small number of problems. One reason is that a large number of training iterations
is required to find a semi-optimal policy in even modest-sized problems. The other
reason is that, often, the memory resources required by this method become too large.
At the heart of the Q-learning method is a function, which is called the Q-function.
Modeling this function is what takes most of the memory resources used by this method.
Several methods have been devised to tackle the Q-learning’s shortcomings, with
relatively good success. However, even the most promising methods do a poor job at
distributing the memory resources available to model the Q-function, which, in turn,
limits the number of problems that can be solved by Q-learning. A new method called
*Moving Prototypes* is proposed to alleviate this problem.

Interested in reading the
entire thesis?
(152 pages, 566,318 bytes, pdf)

Interested in viewing the
Powerpoint presentation?

*Homepage*

Last modified: July 12, 2005 -- © François Cellier