Skip to yearly menu bar Skip to main content


IBM

Expo Talk Panel

vLLM-Hook: Live Programming of Model Internals on vLLM

Ching-Yun Ko ⋅ Pin-Yu Chen ⋅ Kenney Ng

AUDITORIUM
[ ]
Mon 6 Jul 11:30 a.m. KST — 12:30 p.m. KST

Abstract:

vLLM-Hook is a modular plug-in library for vLLM that lets developers and researchers inspect, analyze, and intervene on internal model states during inference. The talk will present the core design of vLLM-Hook, including its configuration-driven hook interface, support for passive programming and active programming, and compatibility with practical deployment workflows. We will show how the system exposes internal signals such as attentions, attention heads, and activations, and how these signals can be used for real-time monitoring and controlled intervention without requiring model retraining. The session will highlight three concrete use cases from the project: prompt-injection detection through in-model monitoring, retrieval enhancement through selective retrieval and reranking signals, and activation steering for controlled generation. The goal of the talk is to give practitioners a clear view of how model-internal programming can become a practical capability in modern LLM serving stacks built on vLLM.

Live content is unavailable. Log in and register to view live content