Invited Talk
in
Workshop: Subset Selection in Machine Learning: From Theory to Applications

Benchmarks and Toolkits for Data Subset Selection in ML through DECILE: Part II

Ganesh Ramakrishnan


Abstract:

In this talk (Part I and II), we will cover the different functionalities of DECILE (www.decile.org) which include modules like a) SUBMODLIB, b) CORDS, c) TRUST, d) DISTIL, e) SPEAR. SUBMODLIB is a library for submodular optimization which implements a number of submodular optimization algorithms and functions (including the submodular mutual information and conditional gain functions). CORDS is a library for data subset selection and coresets for compute-efficient training of deep models. TRUST is targeted subset selection for personalization and model remediation. DISTIL is an active learning toolkit for deep models and SPEAR is a library for weak supervision via labeling functions. We will also focus on the different SOTA algorithms implemented and the benchmarks enabled through these toolkits.

In Part II, we will cover TRUST, DISTIL, and SPEAR.

CORDS: https://github.com/decile-team/cords/ DISTIL: https://github.com/decile-team/distil/ TRUST: https://github.com/decile-team/trust/ SubmodLib: https://github.com/decile-team/submodlib/ SPEAR: https://github.com/decile-team/spear/