Abstract

SVM-Based Generalized Multiple-Instance Learning via Approximate Box Counting
Qingping Tao - University of Nebraska Stephen Scott - University of Nebraska Vinodchandran Variyam - University of Nebraska Thomas Osugi - University of Nebraska
The multiple-instance learning (MIL) model has been very successful inapplication areas such as drug discovery and content-based image-retrieval.Recently, a generalization of this model and an algorithm for thisgeneralization were introduced, showing significant advantages over theconventional MIL model on certain application areas. Unfortunately, thisalgorithm is inherently inefficient, preventing scaling to high dimensions. Wereformulate this algorithm using a kernel for a support vector machine,reducing its time complexity from exponential to polynomial. Computing the kernel is equivalent to counting the number of axis-parallel boxes in a discrete, bounded space that contain at least one point from each of two multisets P and Q. We show that this problem is #P-complete, but then give a fully polynomial randomized approximation scheme (FPRAS) forit. Finally, we empirically evaluate our kernel.

SVM-Based Generalized Multiple-Instance Learning via Approximate Box Counting

Qingping Tao - University of Nebraska
Stephen Scott - University of Nebraska
Vinodchandran Variyam - University of Nebraska
Thomas Osugi - University of Nebraska

The multiple-instance learning (MIL) model has been very successful inapplication areas such as drug discovery and content-based image-retrieval.Recently, a generalization of this model and an algorithm for thisgeneralization were introduced, showing significant advantages over theconventional MIL model on certain application areas. Unfortunately, thisalgorithm is inherently inefficient, preventing scaling to high dimensions. Wereformulate this algorithm using a kernel for a support vector machine,reducing its time complexity from exponential to polynomial. Computing the kernel is equivalent to counting the number of axis-parallel boxes in a discrete, bounded space that contain at least one point from each of two multisets P and Q. We show that this problem is #P-complete, but then give a fully polynomial randomized approximation scheme (FPRAS) forit. Finally, we empirically evaluate our kernel.