Abstract

Learning to Fly by Combining Reinforcement Learning with Behavioural Cloning
Eduardo Morales - Tec de Monterrey - Campus Cuernavaca Claude Sammut - University of New South Wales
Reinforcement learning deals with learning optimal or near optimal policies while interacting with the environment. Application domains with many continuous variables are difficult to solve with existing reinforcement learning methods due to the large search space. In this paper, we use a relational representation to define powerful abstractions that allow us to incorporate domain knowledge and re-use previously learned policies in other similar problems. We also describe how to learn useful actions from human traces using a behavioural cloning approach combined with an exploration phase. Since several conflicting actions may be induced for the same abstract state, reinforcement learning is used to learn an optimal policy over this reduced space. It is shown experimentally how a combination of behavioural cloning and reinforcement learning using a relational representation is powerful enough to learn how to fly an aircraft through different points in space and different turbulence conditions.

Learning to Fly by Combining Reinforcement Learning with Behavioural Cloning

Eduardo Morales - Tec de Monterrey - Campus Cuernavaca
Claude Sammut - University of New South Wales

Reinforcement learning deals with learning optimal or near optimal policies while interacting with the environment. Application domains with many continuous variables are difficult to solve with existing reinforcement learning methods due to the large search space. In this paper, we use a relational representation to define powerful abstractions that allow us to incorporate domain knowledge and re-use previously learned policies in other similar problems. We also describe how to learn useful actions from human traces using a behavioural cloning approach combined with an exploration phase. Since several conflicting actions may be induced for the same abstract state, reinforcement learning is used to learn an optimal policy over this reduced space. It is shown experimentally how a combination of behavioural cloning and reinforcement learning using a relational representation is powerful enough to learn how to fly an aircraft through different points in space and different turbulence conditions.