Timezone: »

Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging
Shishir G. Patil · Paras Jain · Prabal Dutta · Ion Stoica · Joseph E Gonzalez

Thu Jul 21 12:45 PM -- 12:50 PM (PDT) @ Room 327 - 329

Enabling training larger models such as BERT on the edge is important to satisfy privacy constraints as well as offline operation. However, because training is both memory and energy intensive, edge training has historically been limited to relatively small models with simple architectures. In this paper, we present POETRy (Paging for Optimal Energy Training with Rematerialization), a system to enable training large neural networks on memory-scarce battery-operated edge devices. Given a memory budget and a run-time constraint, we formulate a mixed-integer linear program (MILP) for energy-optimal training by selectively rematerializing or paging activations to secondary storage. Our approach enables training significantly larger models on embedded devices while reducing energy consumption while not modifying mathematical correctness of backpropagation. We demonstrate that it is possible to fine-tune both ResNet-18 and BERT within the memory constraints of a Cortex-M class ultra low power embedded device while outperforming current edge training methods in energy efficiency.

Author Information

Shishir G. Patil (UC Berkeley)
Paras Jain (UC Berkeley)
Prabal Dutta (UC Berkeley)
Ion Stoica (UC Berkeley)
Joseph E Gonzalez (UC Berkeley)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors