Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Workshop on Formal Verification of Machine Learning

Interpreting Robustness Proofs of Deep Neural Networks

Debangshu Banerjee · Avaljot Singh · Gagandeep Singh


Abstract:

Numerous methods have emerged to verify the robustness of deep neural networks (DNNs). While effective in providing theoretical guarantees, the proofs generated using these techniques often lack human interpretability. Our paper bridges this gap by introducing new concepts, algorithms, and representations that generate human-understandable interpretations of the proofs. Using our approach, we discover that standard DNN proofs rely more on irrelevant input features compared to provably robust DNNs. Provably robust DNNs filter out spurious input features, but sometimes it comes at the cost of semantically meaningful ones. DNNs combining adversarial and provably robust training strike a balance between the two. Overall, our work enhances human comprehension of proofs and sheds light on their reliance on different types of input features.

Chat is not available.