TraCeS: Learning Per-Timestep Constraint-Violation Credit from Sparse Trajectory-Level Labels
Abstract
Ensuring safe behavior in reinforcement learning (RL) is challenging when safety constraints are implicit and cannot be densely measured. In many settings, supervision is limited to coarse approvals or rejections of whole trajectories (e.g., whether a rollout remained within an unknown safety threshold). We propose TraCeS (Trajectory-based Constraint Estimation for Safety), a method for learning per-timestep violation credit from such sparse trajectory-level labels. TraCeS trains a sequential violation estimator whose per-step credits factorize the predicted probability that a trajectory has not yet violated the constraint, and integrates this learned signal into constrained policy optimization. The method requires neither a known cost function nor a known threshold, and remains compatible with standard continuous-control algorithms. We provide a theoretical analysis of the approximation gap introduced by the learning objective, and demonstrate empirically that TraCeS improves constraint satisfaction and feedback efficiency over baselines across multiple continuous-control benchmarks, including long-horizon tasks and settings with noisy or inconsistent labels.