VisualScore: Learning Holistic Visual Quality Scores via Multi-Task Reasoning
Abstract
Image quality assessment (IQA) is inherently multi-mage quality assessment (IQA) is inherently multi-dimensional, yet existing reward models are typically limited to a single task and become unstable when extended to multi-task settings. In particular, heterogeneous reward scales and variances across tasks can lead to conflicting optimization signals during reinforcement learning. We propose VisualScore, a unified visual evaluation framework that formulates multi-task IQA as structured, task-aware reasoning followed by continuous reward optimization. VisualScore produces interpretable rationales together with scalar quality scores under explicit evaluation principles. We construct a reasoning-enhanced reward modeling dataset via rejection sampling and initialize the model through supervised fine-tuning. VisualScore is then optimized with Group Relative Policy Optimization (GRPO) using a Gaussian-based continuous reward. To address multi-task reward conflicts and stabilize training, we introduce standard deviation filtering and entropy gating to normalize task-wise reward signals and suppress noisy updates. Experiments on technical quality, aesthetic quality, and text–image alignment show that VisualScore improves robustness, generalization, and interpretability, and can effectively guide text-to-image generation at test time without retraining.