PLaID++: A Preference Aligned Language Model for Targeted Inorganic Materials Design
Andy Xu ⋅ Rohan Desai ⋅ Larry Wang ⋅ Ethan Ritz ⋅ Gabriel Hope
Abstract
Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a promising approach to improve correctness in LLMs, however, in many scientific problems, the objective is not necessarily to produce \textit{the} correct answer, but instead to produce a diverse array of candidates which satisfy a set of constraints. We study this challenge in the context of materials generation. To this end, we introduce PLaID++, an LLM post-trained for stable and property-guided crystal generation. We find that applying naive preference optimization to a coordinate-based crystal representation leads to mode collapse. Hence, we introduce a compact, symmetry-informed Wyckoff text representation which improves computational efficiency and encourages generalization from physical priors. By encoding symmetry constraints directly into text and guiding model outputs towards desirable chemical space, PLaID++ generates structures that are thermodynamically stable, unique, and novel at a $>$50\% greater rate than prior methods. We further demonstrate that unified training across conditional and unconditional tasks are mutually beneficial in data-sparse regimes. Our work demonstrates the potential of adapting post-training techniques from natural language processing to materials design, paving the way for targeted and efficient discovery of novel materials.
Successful Page Load