Workshop
|
|
Verifiable Feature Attributions: A Bridge between Post Hoc Explainability and Inherent Interpretability
Usha Bhalla · Suraj Srinivas · Himabindu Lakkaraju
|
|
Workshop
|
|
Is Task-Agnostic Explainable AI a Myth?
Alicja Chaszczewicz
|
|
Workshop
|
|
FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation
|
|
Workshop
|
|
Why do universal adversarial attacks work on large language models?: Geometry might be the answer
|
|
Workshop
|
|
Deceptive Alignment Monitoring
|
|
Workshop
|
|
Adversarial Attacks and Defenses in Explainable Artificial Intelligence: A Survey
|
|
Workshop
|
|
Don't trust your eyes: on the (un)reliability of feature visualizations
|
|
Workshop
|
|
A Pipeline for Interpretable Clinical Subtyping with Deep Metric Learning
Haoran Zhang · Qixuan Jin · Thomas Hartvigsen · Miriam Udler · Marzyeh Ghassemi
|
|
Workshop
|
|
Implicit Interpretation of Importance Weight Aware Updates
Keyi Chen · Francesco Orabona
|
|
Workshop
|
|
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Maya Okawa · Ekdeep Singh Lubana · Robert Dick · Hidenori Tanaka
|
|
Workshop
|
|
Adversarial Attacks and Defenses in Explainable Artificial Intelligence: A Survey
Hubert Baniecki · Przemyslaw Biecek
|
|
Workshop
|
|
SAP-sLDA: An Interpretable Interface for Exploring Unstructured Text
Charumathi Badrinath · Weiwei Pan · Finale Doshi-Velez
|
|