Learning to Rank by Directly Optimizing Full-Order Probabilities
Yongxiang Tang ⋅ Chao Wang ⋅ Jincheng Lu ⋅ Yanhua Cheng ⋅ Xialong Liu ⋅ Peng Jiang
Abstract
Learning to rank can be cast as a probabilistic modeling problem over permutations, where the goal is to estimate the likelihood of an observed total ordering of items. This formulation naturally involves full-order probabilities of the form $\mathbb{P}(\mathrm{z}_1 < \cdots < \mathrm{z}_n)$, whose exact computation and optimization are intractable due to the factorial growth of the permutation space with respect to list size. We introduce the *Full-Order Bound* (FOB), a tractable lower bound on the probability of an observed ordering, constructed from a subset of ordering constraints that factorizes across items while avoiding low-dimensional surrogate objectives and preserving order-reversal invariance. FOB induces a convex inner tightening problem over latent cut points, which we solve efficiently during training using a *safe-region gradient ascent* (SRGA) procedure. Experiments on synthetic ranking tasks and large-scale learning-to-rank benchmarks show that FOB consistently improves performance over pairwise and listwise surrogates, Plackett--Luce style sequential factorization models, and differentiable sorting baselines. Code is available at [https://anonymous.4open.science/r/FOB_2026-46C4](https://anonymous.4open.science/r/FOB_2026-46C4).
Successful Page Load