OpenTSLM: Time-Series Language Models for Reasoning over Multivariate Medical Text- and Time-Series Data
Abstract
Large Language Models (LLMs) have shown strong capability in interpreting multimodal data but remain limited in their ability to natively handle time-series data. Addressing this limitation could enable the translation of longitudinal and wearable sensing data into actionable insights and patient-facing digital health applications. We propose OpenTSLM, a family of Time Series Language Models that integrate time-series as a native modality into pretrained LLMs, enabling natural-language prompting and reasoning over multiple time-series. We implement two OpenTSLM variants based on soft prompting (OpenTSLM-SP) and cross-attention (OpenTSLM-Flamingo). To conduct comprehensive experiments on reasoning over medical text and time-series, we introduce three chain of thought (CoT) datasets: HAR-CoT (human activity recognition), Sleep-CoT (sleep staging), and ECG-QA-CoT (electrocardiogram question answering). Across tasks, OpenTSLM models consistently outperform baselines. OpenTSLMs with time-series encoders trained from scratch achieve 69.88% in sleep staging and 65.44% in HAR, while OpenTSLM combined with time series foundation models (TSFMs) achieve 68.33% and 67.64%, compared to 9.05% and 60.44% for fine-tuned text-only baselines. Additionally, we conduct expert evaluations with cardiologists, which show that OpenTSLMs exhibit strong reasoning capabilities and temporal understanding on raw sensor data for ECG-QA. We further show that OpenTSLM-Flamingo models scale better in memory as the number and length of time-series increase. To facilitate further research, we release all code, datasets, and models as open-source resources.