Workshop: Subset Selection in Machine Learning: From Theory to Applications
More Information, Less Data
In this talk, we introduce several features of Summary Analytics, a new company that makes summarization-like abilities accessible to the masses, and scalable to petabyte-size datasets. We demonstrate wide applicability to different data modalities via a number of case studies. We demonstrate scalability by showing that it is possible to summarize 100 million records, each with 1700 features, down to 1000 records in 60 seconds using one CPU thread, and 14 seconds using four CPU threads, on a 2019-era laptop.