Invited Talk
in
Workshop: Subset Selection in Machine Learning: From Theory to Applications

Benchmarks and Toolkits for Data Subset Selection in ML through DECILE: Part I

Rishabh Iyer


Abstract:

In this talk (Part I and II), we will cover the different functionalities of DECILE (www.decile.org) which include modules like a) SUBMODLIB, b) CORDS, c) TRUST, d) DISTIL, e) SPEAR. SUBMODLIB is a library for submodular optimization which implements a number of submodular optimization algorithms and functions (including the submodular mutual information and conditional gain functions). CORDS is a library for data subset selection and coresets for compute-efficient training of deep models. TRUST is targeted subset selection for personalization and model remediation. DISTIL is an active learning toolkit for deep models and SPEAR is a library for weak supervision via labeling functions. We will also focus on the different SOTA algorithms implemented and the benchmarks enabled through these toolkits.

In Part I, we will cover submodlib (a toolkit for submodular optimization), and CORDS (a toolkit for data subset selection and coresets for efficient training of deep models).

CORDS: https://github.com/decile-team/cords/ DISTIL: https://github.com/decile-team/distil/ TRUST: https://github.com/decile-team/trust/ SubmodLib: https://github.com/decile-team/submodlib/ SPEAR: https://github.com/decile-team/spear/