SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond
Abstract
The success of large language models (LLMs) in scientific domains has heightened safety concerns, prompting numerous benchmarks to evaluate their scientific safety. Existing benchmarks often suffer from limited risk coverage and a reliance on subjective evaluation. To address thess problems, we introduce \textbf{SafeSci}, a comprehensive framework for safety evaluation and enhancement in scientific contexts. SafeSci comprises \textbf{SafeSciBench}, a multi-disciplinary benchmark with 0.25M samples, and \textbf{SafeSciTrain}, a large-scale dataset containing 1.5M samples for safety enhancement. SafeSciBench distinguishes between safety knowledge and risk to cover extensive scopes and employs objective metrics such as deterministically answerable questions to mitigate evaluation bias. We evaluate 21 advanced LLMs, revealing critical vulnerabilities in current models. Finally, we demonstrate that fine-tuning on SafeSciTrain significantly enhances the safety alignment of models. Our work provides both a diagnostic tool and a practical resource for building safer scientific AI systems.