Skip to yearly menu bar Skip to main content


Poster
in
Workshop: The Many Facets of Preference-Based Learning

Optimizing Chatbot Fallback Intent Selections with Reinforcement Learning

Jeremy Curuksu


Abstract: Large language models used in GPT-4 and Alexa are limited by their ability to assess the validity of their own answers i.e., to fall back on a clarification intent when needed. Reinforcement learning can be used specifically to address this fallback selection problem, by adapting to semantic pitfalls of a given language model in a given environment. This is demonstrated in a simplified environment where the chatbot learns when best to ask for clarifications. After training it identifies correct intents in $<$ 2 interactions on average in over 99% of dialogues. In multi-agent simulations where the user cooperates, the chatbot identifies correct intents in 1.3 interactions on average in 100% of dialogues.

Chat is not available.