Poster
in
Workshop: 2nd Workshop on Formal Verification of Machine Learning
Interpreting Robustness Proofs of Deep Neural Networks
Debangshu Banerjee · Avaljot Singh · Gagandeep Singh
Numerous methods have emerged to verify the robustness of deep neural networks (DNNs). While effective in providing theoretical guarantees, the proofs generated using these techniques often lack human interpretability. Our paper bridges this gap by introducing new concepts, algorithms, and representations that generate human-understandable interpretations of the proofs. Using our approach, we discover that standard DNN proofs rely more on irrelevant input features compared to provably robust DNNs. Provably robust DNNs filter out spurious input features, but sometimes it comes at the cost of semantically meaningful ones. DNNs combining adversarial and provably robust training strike a balance between the two. Overall, our work enhances human comprehension of proofs and sheds light on their reliance on different types of input features.