Poster
in
Affinity Event: The 6th Muslims in ML (MusIML) Workshop

LLMs Struggle to Rank Products Robustly

Kumail Alhamoud ⋅ Charikleia Moraitaki ⋅ Carlos Hinojosa ⋅ Jennifer Zhou ⋅ Yuexing Hao ⋅ Phil Torr ⋅ Adel Bibi ⋅ Marzyeh Ghassemi

Project Page

Abstract

People are using LLM agents to compare products (e.g., querying ``what is the best magnesium supplement?''), and those agents retrieve documents from the web to generate their answers. These answers rely on third-party comparison articles, which use editorial framing techniques designed to influence human product decisions. This paper asks the question: do these same influence techniques determine which product an LLM agent recommends? We introduce FramingBench, which measures how 19 influence techniques, drawn from communication and advertising research, shift LLM product rankings across 10 consumer domains and 7 LLMs. All LLMs we test, including frontier models such as GPT-5.4, suffer from framing susceptibility: their product rankings are not invariant to transformations of the input document that preserve the underlying product specifications. The strongest technique places a chosen product at rank 1 in 76\% of cases, demonstrating that the human persuasion playbook transfers reliably to LLM rankers.