Poster
in
Affinity Event: LatinX in AI Workshop

ClinAuth: A Multi-Agent Benchmark and Dataset for Quantifying Omission Bias and Authority Deference in Hierarchical Medical Large Language Models

Erik Gutierrez ⋅ Advait Shrikhande ⋅ Grace Araya ⋅ Olivia Kong ⋅ Kiran Nijjer

Project Page

Abstract

Large Language Models (LLMs) are increasingly used in clinical decision support and electronic health record (EHR) systems, achieving strong performance on medical benchmarks and instruction-following tasks. However, such performance does not ensure safe behavior in hierarchical, high-stakes settings. Alignment methods such as Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO) can encourage sycophantic behavior, leading to a failure mode we term clinical omission bias, where models permit harmful decisions in the form of silence to avoid conflict/penalties. On the other hand, Reinforcement Learning from Human/Artificial Intelligence Feedback (RLHF)/(RLAIF) lead to unnecessary and false interventions, preventing proper treatments, inducing a clinical commission bias. We introduce the Clinical Authority Family, including the ClinAuth Benchmark and ClinAuth-800 dataset, a multi-agent framework for evaluating LLM behavior under authority pressure. We simulate a hospital setting in which a model evaluates treatment decisions proposed by an attending physician and controls EHR access. By varying physician tone, system-level incentives, and weak-to-strong oversight, we study the effects of alignment methods across frontier models. \textbf{In 20,000 simulations, frontier models falsely refuse correct treatments in 17.08\% of control cases and allow fatal clinical errors in 37.33\% of harmful cases, resulting in an overall harmful EHR interaction rate of 27.17\% among all cases.}