Skip to yearly menu bar Skip to main content


Poster

Mimicking Better by Matching the Approximate Action Distribution

Joao A. Candido Ramos · Lionel Blondé · Naoya Takeishi · Alexandros Kalousis

Hall C 4-9 #1209
[ ] [ Paper PDF ]
[ Poster
Thu 25 Jul 2:30 a.m. PDT — 4 a.m. PDT

Abstract:

In this paper, we introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations. MAAD utilizes a surrogate reward signal, which can be derived from various sources such as adversarial games, trajectory matching objectives, or optimal transport criteria. To compensate for the non-availability of expert actions, we rely on an inverse dynamics model that infers plausible actions distribution given the expert’s state-state transitions; we regularize the imitator’s policy by aligning it to the inferred action distribution. MAAD leads to significantly improved sample efficiency and stability. We demonstrate its effectiveness in a number of MuJoCo environments, both int the OpenAI Gym and the DeepMind Control Suite. We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods. Remarkably, MAAD often stands out as the sole method capable of attaining expert performance levels, underscoring its simplicity and efficacy.

Chat is not available.