Multiclass Multiple Kernel Learning

Multiclass Multiple Kernel Learning
Alexander Zien - MPI for Biological Cybernetics and Friedrich Miescher Laboratory, Tübingen, Germany Cheng Soon Ong - MPI for Biological Cybernetics and Friedrich Miescher Laboratory, Tübingen, Germany
In many applications it is desirable to learn from several kernels. "Multiple kernel learning" (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semi-inï¬?nite linear programs (SILPs) on toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets.

Alexander Zien - MPI for Biological Cybernetics and Friedrich Miescher Laboratory, Tübingen, Germany
Cheng Soon Ong - MPI for Biological Cybernetics and Friedrich Miescher Laboratory, Tübingen, Germany

In many applications it is desirable to learn from several kernels. "Multiple kernel learning" (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semi-inï¬?nite linear programs (SILPs) on toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets.