Timezone: »
Poster
Understanding Generalization and Optimization Performance of Deep CNNs
Pan Zhou · Jiashi Feng
This work aims to provide understandings on the remarkable success of deep convolutional neural networks (CNNs) by theoretically analyzing their generalization performance and establishing optimization guarantees for gradient descent based training algorithms. Specifically, for a CNN model consisting of $l$ convolutional layers and one fully connected layer, we prove that its generalization error is bounded by $\mathcal{O}(\sqrt{\dt\widetilde{\varrho}/n})$ where $\theta$ denotes freedom degree of the network parameters and $\widetilde{\varrho}=\mathcal{O}(\log(\prod_{i=1}^{l}\rwi{i} (\ki{i}-\si{i}+1)/p)+\log(\rf))$ encapsulates architecture parameters including the kernel size $\ki{i}$, stride $\si{i}$, pooling size $p$ and parameter magnitude $\rwi{i}$. To our best knowledge, this is the first generalization bound that only depends on $\mathcal{O}(\log(\prod_{i=1}^{l+1}\rwi{i}))$, tighter than existing ones that all involve an exponential term like $\mathcal{O}(\prod_{i=1}^{l+1}\rwi{i})$. Besides, we prove that for an arbitrary gradient descent algorithm, the computed approximate stationary point by minimizing empirical risk is also an approximate stationary point to the population risk. This well explains why gradient descent training algorithms usually perform sufficiently well in practice. Furthermore, we prove the one-to-one correspondence and convergence guarantees for the non-degenerate stationary points between the empirical and population risks. It implies that the computed local minimum for the empirical risk is also close to a local minimum for the population risk, thus ensuring that the optimized CNN model well generalizes to new data.
Author Information
Pan Zhou (National University of Singapore)
Jiashi Feng (National University of Singapore)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: Understanding Generalization and Optimization Performance of Deep CNNs »
Thu. Jul 12th 02:50 -- 03:00 PM Room K1
More from the Same Authors
-
2021 Poster: CIFS: Improving Adversarial Robustness of CNNs via Channel-wise Importance-based Feature Selection »
Hanshu YAN · Jingfeng Zhang · Gang Niu · Jiashi Feng · Vincent Tan · Masashi Sugiyama -
2021 Spotlight: CIFS: Improving Adversarial Robustness of CNNs via Channel-wise Importance-based Feature Selection »
Hanshu YAN · Jingfeng Zhang · Gang Niu · Jiashi Feng · Vincent Tan · Masashi Sugiyama -
2021 Poster: Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing »
Kaixin Wang · Kuangqi Zhou · Qixin Zhang · Jie Shao · Bryan Hooi · Jiashi Feng -
2021 Spotlight: Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing »
Kaixin Wang · Kuangqi Zhou · Qixin Zhang · Jie Shao · Bryan Hooi · Jiashi Feng -
2020 Poster: Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation »
Jian Liang · Dapeng Hu · Jiashi Feng -
2018 Poster: Policy Optimization with Demonstrations »
Bingyi Kang · Zequn Jie · Jiashi Feng -
2018 Poster: WSNet: Compact and Efficient Networks Through Weight Sampling »
Xiaojie Jin · Yingzhen Yang · Ning Xu · Jianchao Yang · Nebojsa Jojic · Jiashi Feng · Shuicheng Yan -
2018 Oral: WSNet: Compact and Efficient Networks Through Weight Sampling »
Xiaojie Jin · Yingzhen Yang · Ning Xu · Jianchao Yang · Nebojsa Jojic · Jiashi Feng · Shuicheng Yan -
2018 Oral: Policy Optimization with Demonstrations »
Bingyi Kang · Zequn Jie · Jiashi Feng