Workshop: Subset Selection in Machine Learning: From Theory to Applications
Introduction to Coresets and Open Problems
When we want to discover patterns in data, we usually use the best available algorithm/software or try to improve it. In recent years we have started exploring a different approach: instead of improving the algorithm, reduce the input data and run the existing algorithm on the reduced data to obtain the desired output much faster. A core-set for a given problem is a semantic compression of its input, in the sense that a solution for the problem with the (small) core-set as input yields an approximate solution to the problem with the original (Big) data. Using existing techniques it can be computed on a streaming input that may be distributed among several machines on device, and in parallel (e.g. clouds or GPUs).