Exploring 3D Dataset Pruning
Abstract
Dataset pruning remains underexplored for 3D modalities, where inherent class imbalance persists across both training and test sets. This creates a divergence in evaluation: overall accuracy favors natural frequency, reflecting practical usage; while mean accuracy demands balanced generalization. Instead of forcing a premature trade-off, we advocate for base principles that remain universally robust and beneficial across diverse priors. We cast pruning as a quadrature approximation on population risk and decompose the error bound into representation error (fidelity to the underlying manifold) and prior-mismatch bias (distribution shift), clarifying what can be improved jointly across priors. To address prior-mismatch bias, we decouple likelihood from prior in the posterior and transfer the structural likelihood via distillation with a calibrated teacher and geometry-preserving constraints. Simultaneously, to reduce representation error, we audit common pruning signals and choose geometric embedding, which exhibits greater robustness given the high inductive bias of 3D models. We also prioritize a safety floor before selection, capturing high-reward regions beneficial across priors. Finally, acknowledging that no single subset optimally satisfies divergent evaluation priors, we augment these principles with a steering wrapper that interpolates between stratified seeding and global selection. Empirical results demonstrate that our framework elevates the performance floor while offering flexibility for different prior preferences.