WhenSpecializationLoses: ADecisionBoundaryStudyofSarvam-1vs.Translate-TestforHindiNLP
Abstract
The recent release of Sarvam-1, a 2B-parameter Indic-specialized language model, has been positioned as a foundational answer to Hindi NLP. We empirically test this positioning against a simple, zero-cost alternative: a translate-test pipeline (IndicTrans2 → English LLM → IndicTrans2). Using a 60-question Hindi diagnostic suite stratified across factoid, reasoning, Indian-cultural, and universal-cultural question types, we find that the translate-test pipeline (powered by Qwen2.5-7B-Instruct in 4-bit) outperforms native Sarvam-1 inference on every non-trivial category, with the gap reaching 26.7 percentage points on multi-step reasoning and 16.7 points on Indian-cultural questions — the very domain where specialization should help most. We provide a practitioner's decision tree, qualitative error analysis, and discuss implications for the development of small specialized Indic models. Our results suggest that 2B-parameter Indic-specialized models, in their current form, do not yet justify their use over translate-test pipelines for Hindi practitioners.