Evaluating Indic Literary Creativity In LLMs through the Lens of the Shastras
Abstract
Evaluating creativity in Large Language Models (LLMs) requires culturally grounded frameworks, especially for non-English traditions. We present a methodology to assess LLM creativity in Hindi and Bengali by adapting the Torrance Tests of Creative Thinking (TTCT) to include \textit{Alankara} (classical figures of speech). Measuring Originality, Fluency, Flexibility, and Elaboration via six canonical rhetorical devices, we evaluate eight LLMs against human literature using an ICC-validated LLM-as-a-judge pipeline. In Hindi, human texts demonstrate superior originality. In Bengali, several models numerically outperform human baselines due to an \textit{Alankara} over-stuffing bias: models generate unnaturally high densities of rhetorical devices, exploiting evaluation heuristics to inflate scores. Our findings highlight vulnerabilities in density-based metrics and emphasize the need for robust, culturally specific evaluation in multilingual NLP.