Dynamic neural networks are becoming increasingly common, and yet it is hard to implement them efficiently. One-the-fly operation batching for such models is sub-optimal and suffers from run time overheads, while writing manually batched versions can be hard and error-prone. To address this we extend TensorFlow with pfor, a parallel-for loop optimized using static loop vectorization. With pfor, users can express computation using nested loops and conditional constructs, but get performance resembling that of a manually batched version. Benchmarks demonstrate speedups of one to two orders of magnitude on range of tasks, from jacobian computation, to TreeLSTMs.
Ashish Agarwal (Google Brain)
Related Events (a corresponding poster, oral, or spotlight)
2019 Poster: Static Automatic Batching In TensorFlow »
Wed. Jun 12th 01:30 -- 04:00 AM Room Pacific Ballroom #87
More from the Same Authors
2020 Poster: The Shapley Taylor Interaction Index »
Mukund Sundararajan · Kedar Dhamdhere · Ashish Agarwal