Reinforced Neural Processes: Memory-Efficient Time-Series Forecasting with a World-Feedback-Trained Memory Policy
Abstract
Neural Processes (NPs) provide a lightweight framework for uncertainty-aware regression by conditioning predictions on a compact context set of observed input-output examples in settings such as meta-regression, Bayesian optimization, and spatiotemporal prediction. In continuous learning settings, however, context selection becomes an online memory problem: as new observations arrive, which examples should be retained? Since retaining every observation is intractable, bounded-memory implementations rely on fixed heuristics such as sliding windows, reservoir sampling, or surprise thresholds, each encoding a static memory prior. We introduce Reinforced Neural Processes (RNP), a backbone-agnostic memory framework that pairs a tiered context buffer with a gated two-branch encoder and learns an insertion/eviction policy from world feedback: the downstream predictive log-likelihood induced by each memory action relative to its counterfactual alternative. We instantiate RNP on attention (R-ANP) and convolutional (R-ConvCNP) backbones and evaluate on four streaming benchmarks (delay-differential systems, regime-switching streams, abrupt-MNIST, and a wearable energy-expenditure dataset) across varying memory budgets. The best RNP variant attains the highest likelihood on 27 of 32 streams.