PhD thesis, by Rémi Coulom
This thesis is a study of practical methods to estimate value functions with feedforward neural networks in model-based reinforcement learning. Focus is placed on problems in continuous time and space, such as motor-control tasks. In this work, the continuous TD(lambda) algorithm is refined to handle situations with discontinuous states and controls, and the vario-eta algorithm is proposed as a simple but efficient method to perform gradient descent. The main contributions of this thesis are experimental successes that clearly indicate the potential of feedforward neural networks to estimate high-dimensional value functions. Linear function approximators have been often preferred in reinforcement learning, but their success is restricted to relatively simple mechanical systems, or require a lot of prior knowledge. The method presented in this thesis was tested successfully on an original task of learning to swim by a simulated articulated robot, with 4 control variables and 12 independent state variables.
(only the first pages are in French, the rest is in English):
For those who cannot run the win32 demos below, some avi movies demonstrating the movements of swimmers (DivX codec required):
A few interactive (win32) swimmer demos (click in the window to change swimming direction):
Source code of the swimmer simulator:
RARS demo:
@phdthesis{ Coulom-2002a, author = "R\'emi Coulom", title = "Reinforcement Learning Using Neural Networks, with Applications to Motor Control", school = "Institut National Polytechnique de Grenoble", year = 2002 }