Poster
in
Workshop: Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact
Towards Safe Large Language Models for Medicine
Tessa Han · Aounon Kumar · Chirag Agarwal · Himabindu Lakkaraju
As large language models (LLMs) develop ever-improving capabilities and are applied in real-world settings, their safety is critical. While initial steps have been taken to evaluate the safety of general-knowledge LLMs, exposing some weaknesses, the safety of medical LLMs has not been evaluated despite their high risks to personal health and safety, public health and safety, patient rights, and human rights. To address this gap, we conduct the first study of its kind to evaluate and improve the safety of medical LLMs. We find that 1) current medical LLMs do not meet standards of general or medical safety, as they readily comply with harmful requests and that 2) fine-tuning medical LLMs on safety demonstrations significantly improves their safety. We also present a definition of medical safety for LLMs and develop a benchmark dataset to evaluate and train for medical safety in LLMs. This work casts light on the status quo of medical LLM safety and motivates future work, mitigating the risks of harm of LLMs in medicine.