Workshop
Actionable Interpretability
Tal Haklay · Hadas Orgad · Anja Reusch · Marius Mosbach · Sarah Wiegreffe · Ian Tenney · Mor Geva
Interpretability research has advanced considerably in uncovering the inner mechanisms of artificial intelligence (AI) systems and has become a crucial subfield within AI. However, translating interpretability findings into actionable improvements in model design, training, and deployment remains a challenge. As a result, such insights have rarely influenced real-world AI development. This workshop addresses a key yet underexplored question: How can interpretability research drive tangible advancements in AI systems? By fostering discussions on the practical applications of interpretability, we aim to bridge this gap and highlight work that moves beyond analysis to achieve concrete improvements in model alignment, robustness, and domain-specific performance. Through this workshop, we strive to refocus interpretability research on actionable impact rather than just analysis, ensuring its insights lead to meaningful advancements in AI.
Live content is unavailable. Log in and register to view live content