Sequential Information Bottleneck for Finite Data |
---|

Jaakko Peltonen - Helsinki University of Technology, Neural Networks Research CentreJanne Sinkkonen - Helsinki University of Technology, Neural Networks Research CentreSamuel Kaski - Helsinki University of Technology, Neural Networks Research Centre |

The sequential information bottleneck (sIB) algorithm clusters co-occurrencedata such as text documents vs. words. We introduce a variant that modelssparse co-occurrence data by a generative process. This turns the objectivefunction of sIB, mutual information, into a Bayes factor, while keeping itintact asymptotically, for non-sparse data. Experimental performance of thenew algorithm is comparable to the original sIB for large data sets, andbetter for smaller, sparse sets. |