Time-Conditioned Foreseeing: An EHR-Specific Foundation Model for Irregular Dynamics and Calendrical Time
Abstract
Electronic Health Records (EHRs) possess unique characteristics distinct from natural language, yet existing EHR foundation models often rely on suboptimal NLP-based approaches. We propose a pretraining method tailored to EHRs' distinct features. First, we introduce Pathology-Focused Binning, a density-based quantization strategy that prioritizes clinically significant numerical ranges over usual values. Second, to jointly capture both the exact timing of clinical events and the relative intervals between them, we propose Dual-Calendar Rotary Positional Embedding (RoPE), which encodes absolute and relative temporal signals. Third, we introduce the Time-Conditioned Foreseeing (TCF) objective, aligning with clinical treatment planning to forecast events across multiple temporal horizons by explicitly modeling event timing. Our approach establishes a temporal generative EHR model that outperforms existing foundation models on nine diverse downstream tasks—achieving up to a 48% improvement in AUPRC—and enables the generation of realistic, temporally consistent patient trajectories.