ICML DPO-Finetuned Large Multi-Modal Planner with Retrieval-Augmented Generation @ EgoPlan Challenge ICML 2024

Poster
in
Workshop: Multi-modal Foundation Model meets Embodied AI (MFM-EAI)

DPO-Finetuned Large Multi-Modal Planner with Retrieval-Augmented Generation @ EgoPlan Challenge ICML 2024

Kwanghyeon Lee · Mina Kang · Hyungho Na · HeeSun Bae · Byeonghu Na · Doyun Kwon · Seungjae Shin · Yeongmin Kim · Kim taewoo · Seungmin Yun · IL CHUL MOON

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

This paper presents technical details for solving a multi-modal task, EgoPlan-Bench. Our model adopts Direct Preference Optimization (DPO), which is originally developed for a single-modal task, to be utilized in a multi-modal setting. This DPO adaptation improves prediction accuracy by highlighting positive answers over negative choices. Additionally, we apply Retrieval-Augmented Generation (RAG) to further enhance generation performance in Multi-modal Large Language Models (MLLMs). However, in our settings, the RAG method does not result in a performance improvement due to the limited retrieval of similar tasks. Our model utilizing DPO shows performance improvements and achieves 53.98% test accuracy compared to baseline methods of 41.35%. Our code is available at https://github.com/aailabkaist/EgoPlanChallengeTeam_AAILab.

Chat is not available.

Poster in Workshop: Multi-modal Foundation Model meets Embodied AI (MFM-EAI)

DPO-Finetuned Large Multi-Modal Planner with Retrieval-Augmented Generation @ EgoPlan Challenge ICML 2024

Kwanghyeon Lee · Mina Kang · Hyungho Na · HeeSun Bae · Byeonghu Na · Doyun Kwon · Seungjae Shin · Yeongmin Kim · Kim taewoo · Seungmin Yun · IL CHUL MOON

Poster
in
Workshop: Multi-modal Foundation Model meets Embodied AI (MFM-EAI)