Workshop: Subset Selection in Machine Learning: From Theory to Applications

Interactive Teaching for Imbalanced Data Summarization

Farhad Pourkamali-Anaraki · Walter Bennette


A fundamental problem in machine learning is developing data summarization, also known as coreset construction, techniques with minimal impact on model accuracy. However, a practical difficulty of the majority of existing methods is the lack of rigorous strategies for identifying the optimal coreset size. Moreover, these methods are often built around specific machine learning models, making it difficult for practitioners to apply them in various applications. This paper presents an interactive teaching method for adaptively constructing a small subset of representative samples to address these problems. Numerical experiments on three imbalanced data sets indicate the great potential and applicability of the proposed approach.