Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Neural Conversational AI Workshop - What’s left to TEACH (Trustworthy, Enhanced, Adaptable, Capable and Human-centric) chatbots?

Large Language Models can Share Images, Too!

Young-Jun Lee · Jonghwan Hyeon · Ho-Jin Choi


Abstract:

This paper explores the image-sharing capability of Large Language Models (LLMs), such as InstructGPT, ChatGPT, and GPT-4, in a zero-shot setting, without the help of visual foundation models.Inspired by the two-stage process of image-sharing in human dialogues, we propose a two-stage framework that allows LLMs to predict potential image-sharing turns and generate related image descriptions using our effective restriction-based prompt template.With extensive experiments, we unlock the image-sharing capability of LLMs in zero-shot prompting, with GPT-4 achieving state-of-the-art performance.Additionally, we uncover the emergent image-sharing ability in zero-shot prompting, demonstrating the effectiveness of restriction-based prompts in both stages of our framework.Based on this framework, we augment the PhotoChat dataset with images generated by Stable Diffusion at predicted turns, namely PhotoChat++.To our knowledge, this is the first study to assess the image-sharing ability of LLMs in a zero-shot setting without visual foundation models.The source code and the dataset will be released after publication.

Chat is not available.