Skip to yearly menu bar Skip to main content


Afternoon Poster
in
Workshop: Artificial Intelligence & Human Computer Interaction

LeetPrompt: Leveraging Collective Human Intelligence to Study LLMs

Sebastin Santy · Ayana Bharadwaj · Sahith Dambekodi · Alex Albert · Cathy Yuan · Ranjay Krishna


Abstract:

Writing effective instructions (or prompts) is rapidly evolving into a dark art, spawning websites dedicated to collecting, sharing, and even selling instructions. Yet, the research efforts evaluating large language models (LLMs) either limit instructions to a predefined set or worse, make anecdotal claims without rigorously testing sufficient instructions. In reaction to this cottage industry of instruction design, we introduce LeetPrompt: a platform where people can interactively explore the space of instructions to solve problems. LeetPrompt automatically evaluates human-LLM interactions to provide insights about both LLMs as well as human-interaction behavior. With LeetPrompt, we conduct a within-subjects user study (N=20) across 10 problems from 5 domains: biology, physics, math, programming, and general knowledge. By analyzing 1178 instructions used to invoke GPT-4, we present the following findings: First, we find that participants are able to design instructions for all tasks, including those that problem setters deemed unlikely to be solved. Second, all automatic mechanisms fail to generate instructions to solve all tasks. Third, the lexical diversity of instructions is significantly correlated with whether people were able to solve the problem, highlighting the need for diverse instructions when evaluating LLMs. Fourth, many instruction strategies are unsuccessful, highlighting the misalignment between participant's conceptual model of the LLM and its functionality. Fifth, participants with experience with prompting, and with math spend significantly more time on LeetPrompt. Sixth, we find that people use more diverse instruction strategies than these automatic baselines. Finally, LeetPrompt facilitates a learning effect: participants self-reported improving as they solved each subsequent problem.

Chat is not available.