Words Towards Explainability: Caption Label-Free Learning via Dual Loop Agentic Time Series Captioning
Abstract
Explainability is essential for applying time series analysis in high-stakes domains. While Time Series Captioning (TSC) offers a pathway to enhance temporal explainability, achieving reliable caption generation usually necessitates high-quality textual annotations. However, as interpreting abstract temporal dynamics requires specialized domain knowledge, acquiring such caption annotations is challenging, thereby impeding the advancement of TSC. To address this challenge, we introduce a novel Caption Label-Free Learning (CLFL) paradigm. Departing from the supervised learning tradition of imitating human annotations, CLFL formulates captioning as an agentic exploration task optimized by feedback from a proxy reward. Specifically, we propose a Dual Loop Agentic Captioning (DLAC) framework to achieve such an exploration-feedback mechanism. In the inner loop, a Time Series Captioning LLM Agent (TSCAgent) reflectively explores potential semantic captions. In turn, the outer loop evaluates these captions via downstream reasoning to derive a proxy reward, which feeds back to optimize the TSCAgent. Empirical results validate the effectiveness of the CLFL, proving that the exploration-feedback mechanism is sufficient for learning complex temporal semantics and autonomously generating captions, without any caption label supervision. Furthermore, we release TFTSC, an industrial expert-level time series caption dataset, which is available at: \url{https://anonymous.4open.science/r/TFTSC-05ED/}.