The computation of convolution layers in deep neural networkstypically rely on high performance routines that tradespace for time by using additional memory (either for packing purposesor required as part of the algorithm) to improve performance. Theproblems with such an approach are two-fold. First, these routinesincur additional memory overhead which reduces the overall size of thenetwork that can fit on embedded devices with limited memory capacity. Second, these high performance routines were not optimized for performing convolution, which means that the performance obtained is usually less than conventionally expected. Inthis paper, we demonstrate that direct convolution,when implemented correctly, eliminates allmemory overhead, and yields performance that isbetween 10% to 400% times better than existinghigh performance implementations of convolutionlayers on conventional and embedded CPU architectures.We also show that a high performancedirect convolution exhibits better scaling performance,i.e. suffers less performance drop, whenincreasing the number of threads.
Jiyuan Zhang (Cargenie Mellon University)
Franz Franchetti (Carnegie Mellon University)
Tze Meng Low (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
2018 Poster: High Performance Zero-Memory Overhead Direct Convolutions »
Thu Jul 12th 04:15 -- 07:00 PM Room Hall B