Timezone: »

Atlas Wang
Zhangyang “Atlas” Wang

Bio: Atlas Wang (https://vita-group.github.io/) teaches and researches at UT Austin ECE (primary), CS, and Oden CSEM. He usually declares his research interest as machine learning, but is never too sure what that means concretely. He has won some awards, but is mainly proud of just three things: (1) he has done some (hopefully) thought-invoking and practically meaningful work on sparsity, from inverse problems to deep learning; his recent favorites include “essential sparsity”, “junk DNA hypothesis”, and “heavy-hitter oracle”; (2) he co-founded the Conference on Parsimony and Learning (CPAL), known as the new " conference for sparsity" to its community, and serves as its inaugural program chair; (3) he is fortunate enough to work with a sizable group of world-class students, who are all smarter than himself. He has graduated 10 Ph.D. students that are well placed, including two new assistant professors; and his students have altogether won seven PhD fellowships besides many other honors.

Title: On the Complicate Romance between Sparsity and Robustness

Abstract: Prior arts have observed that appropriate sparsity (or pruning) can improve the empirical robustness of deep neural networks (NNs). In this talk, I will introduce our recent findings extending this line of research. We have firstly demonstrated that sparsity can be injected into adversarial training, either statically or dynamically, to reduce the robust generalization gap besides significantly saving training and inference FLOPs. We then show that pruning can also improve certified robustness for ReLU-based NNs at scale, under the complete verification setting. Lastly, we theoretically characterize the complicated relationship between neural network sparsity and generalization. It is revealed that, as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero and the network exhibits good generalization. Meanwhile, there also exists a large pruning fraction such that while gradient descent is still able to drive the training loss toward zero (by memorizing noise), the generalization performance is no better than random guessing.

Author Information

Zhangyang “Atlas” Wang (University of Texas at Austin)

More from the Same Authors