Timezone: »
This workshop aims to address fundamental problems in the young but potentially highly-impactful field of machine-learning-based methods for data compression and communication. We invite participates to exchange ideas on fundamental issues in neural compression such as the role of quantization and stochasticity in communication, characterization and estimation of information measures, and more resource-efficient models/methods. We aim to address these fundamental issues by bringing together researchers from various fields including machine learning, information theory, statistics, and computer vision.
Sat 12:00 p.m. - 12:05 p.m.
|
Daniel Severo
(
Intro & Welcome
)
SlidesLive Video » |
🔗 |
Sat 12:05 p.m. - 12:35 p.m.
|
Johannes Ballé
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Sat 12:35 p.m. - 1:05 p.m.
|
Tsachy Weissman
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Sat 1:05 p.m. - 1:20 p.m.
|
Coffee Break
|
🔗 |
Sat 1:20 p.m. - 1:50 p.m.
|
José Miguel Hernández-Lobato
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Sat 1:50 p.m. - 2:10 p.m.
|
Contributed Talk 1 - Neural Distributed Compressor Does Binning
(
Contributed Talk
)
SlidesLive Video » |
🔗 |
Sat 2:10 p.m. - 2:55 p.m.
|
Ashish Khisti, Ties van Rozendaal, George Toderici, Rashmi Vinayak
(
Panel
)
SlidesLive Video » |
🔗 |
Sat 2:55 p.m. - 3:55 p.m.
|
Lunch Break
|
🔗 |
Sat 3:55 p.m. - 4:25 p.m.
|
Hyeji Kim
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Sat 4:25 p.m. - 4:45 p.m.
|
Contributed Talk 2 - Entropy Coding of Unordered Data Structures
(
Contributed Talk
)
SlidesLive Video » |
🔗 |
Sat 4:45 p.m. - 5:15 p.m.
|
Yan Lu
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Sat 5:15 p.m. - 6:45 p.m.
|
Poster Session
|
🔗 |
Sat 6:45 p.m. - 7:15 p.m.
|
Aaron Wagner
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Sat 7:15 p.m. - 7:35 p.m.
|
Contributed Talk 3 - Neural Image Compression: Generalization, Robustness, and Spectral Bias
(
Contributed Talk
)
SlidesLive Video » |
🔗 |
Sat 7:35 p.m. - 7:55 p.m.
|
Contributed Talk 4 - Slicing Mutual Information Generalization Bounds for Neural Networks
(
Contributed Talk
)
SlidesLive Video » |
🔗 |
Sat 7:55 p.m. - 8:00 p.m.
|
Berivan Isik
(
Concluding Remarks
)
SlidesLive Video » |
🔗 |
-
|
Why Quantization Improves Generalization: NTK of Binary Weight Neural Network
(
Poster
)
link »
Quantized neural networks have drawn a lot of attention as they reduce the space and computational complexity during the inference. Moreover, there has been folklore that quantization acts as an implicit regularizer and thus can improve the generalizability of neural networks, yet no existing work formalizes this interesting folklore. In this paper, we take the binary weights in a neural network as random variables under stochastic rounding, and study the distribution propagation over different layers in the neural network. We propose a \emph{quasi neural network} to approximate the distribution propagation, which is a neural network with continuous parameters and smooth activation function. We derive the neural tangent kernel (NTK) for this quasi neural network, and show the eigenvalue of NTK decays at approximately exponential rate, which is comparable to that of Gaussian kernel with randomized scale. We use experiments to verify that the quasi neural network we proposed can well approximate binary weight neural network. Lastly, binary weight neural network gives a lower generalization gap compared with real value weight neural network. |
Kaiqi Zhang · Ming Yin · Yu-Xiang Wang 🔗 |
-
|
Diagnostically Lossless Compression of Medical Images
(
Poster
)
link »
Medical images (e.g. X-rays) are often acquired at high resolutions with large dimensions in order to capture fine-grained details. In this work, we address the challenge of compressing medical images while preserving fine-grained features needed for diagnosis, a property known as diagnostic losslessness. To this end, we (1) use over one million medical images to train a domain-specific neural compressor and (2) develop a comprehensive evaluation suite for measuring compressed image quality. Extensive experiments demonstrate that large-scale, domain-specific training of neural compressors improves the diagnostic losslessness of compressed images when compared to prior approaches. |
Rogier Van der Sluijs · Maya Varma · Jip Prince · Curtis Langlotz · Akshay Chaudhari 🔗 |
-
|
Reconstruction Distortion of Learned Image Compression with Imperceptible Perturbations
(
Poster
)
link »
|
Yang Sui · Zhuohang Li · Ding Ding · Xiang Pan · Xiaozhong Xu · Shan Liu · Zhenzhong Chen 🔗 |
-
|
Transformers are Universal Predictors
(
Poster
)
link »
We find limits to the Transformer architecture for language modeling and show it has a universal prediction property in an information-theoretic sense. We further analyze their performance in non-asymptotic data regimes to understand the role of various components of the Transformer architecture, especially in the context of data-efficient training. We validate our theoretical analysis with experiments on both synthetic and real datasets. Virtual talk: https://drive.google.com/file/d/1wx45om05jQrkFvyVZoWxoGcEr41IeUP/view?usp=drivelink |
Sourya Basu · Moulik Choraria · Lav Varshney 🔗 |
-
|
Low Complexity Neural Network-Based In-loop Filtering with Decomposed Split Luma-Chroma Model for Video Compression
(
Poster
)
link »
In this paper, a novel low complexity split luma-chroma model is proposed for in-loop filtering in video compression. The basic block of the model adopts the decomposed regular 3x3 convolutional layer, which is replaced by 1x1 pointwise convolutions and 3x1/1x3 separable convolutions via CP decomposition to reduce complexity. It’s further proposed to fuse the two adjacent 1x1 convolutional layers into one. To efficiently exploit the dependencies between luma and chroma while modeling the independent characteristics of luma/chroma component, a novel split luma-chroma architecture within one CNN model is proposed. The input layer and the first hidden layers serving as the common path jointly process luma-chroma inputs. Then the output feature maps are split into luma and chroma feature maps, and they are independently processed using the same basic block as in common path, i.e., one luma path with 24 channels and one chroma path with 8 channels. Experimental results show that the model has 5.66% BD-Rate luma gain over NNVC-4.0 under RA while the chroma gains are also greatly improves, at the complexity of 17.7 kMAC/Pixel. The BD-Rate and kMAC/Pixel plot also shows the superior trade-off between complexity and coding gain compared to state-of-the-art filters. And the subjective results demonstrate improved visual quality. Moreover, the split luma-chroma architecture also possess the flexibility to get arbitrary luma-chroma rate-distortion distribution by adjusting the number of channels in each path. |
Tong Shao · Jay Shingala · Ajay Shyam · Peng Yin · Arjun Arora · Sean McCarthy 🔗 |
-
|
Lossy Image Compression with Conditional Diffusion Model
(
Poster
)
link »
This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. The approach relies on the transform coding paradigm, where an image is mapped into a latent space for entropy coding and, from there, mapped back to the data space for reconstruction. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach thus introduces an additional "content" latent variable on which the reverse diffusion process is conditioned and uses this variable to store information about the image. The remaining "texture" variables characterizing the diffusion process are synthesized at decoding time. We show that the model's performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving multiple datasets and image quality assessment metrics show that our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics. Furthermore, training the diffusion with $\mathcal{X}$-parameterization enables high-quality reconstructions in only a handful of decoding steps, greatly affecting the model's practicality.
|
Ruihan Yang · Stephan Mandt 🔗 |
-
|
FusionToken: Enhancing Compression and Efficiency in Language Model Tokenization
(
Poster
)
link »
We propose FusionToken, a novel method that substantially enhances the conventional Byte Pair Encoding (BPE) approach in data encoding for language models. FusionToken employs a more aggressive computational strategy compared to BPE, expanding the token groups from bi-grams to 10-grams. Remarkably, with the addition of just 1,000 tokens to the vocabulary, the compression rate significantly surpasses that of a regular BPE tokenizer with a vocabulary of one million. Overall, the FusionToken method leads to noticeable performance improvements due to an increased data scope per compute unit and faster inference times due to fewer tokens per given string. By devoting more compute resources to the tokenizer building process, FusionToken maximizes the potential of language models as efficient data compression engines, enabling more effective language modeling systems. |
Robert Kwiatkowski · Zijian Wang · Robert Giaquinto · Varun Kumar · Xiaofei Ma · Anoop Deoras · Bing Xiang · Ben Athiwaratkun 🔗 |
-
|
Estimating the Rate-Distortion Function by Wasserstein Gradient Descent
(
Poster
)
link »
In the theory of lossy compression, the rate-distortion function $R(D)$ of a given data source characterizes the fundamental limit of compression performance by any algorithm. We propose a method to estimate $R(D)$ in the continuous setting based on Wasserstein gradient descent. While the classic Blahut--Arimoto algorithm only optimizes probability weights over the support points of its initialization, our method leverages optimal transport theory and learns the support of the optimal reproduction distribution by moving particles. This makes it more suitable for high dimensional continuous problems. Our method complements state-of-the-art neural network-based methods in rate-distortion estimation, achieving comparable or improved results with less tuning and computation effort. In addition, we can derive its convergence and finite-sample properties analytically. Our study also applies to maximum likelihood deconvolution and regularized Kantorovich estimation, as those tasks boil down to mathematically equivalent minimization problems.
|
Yibo Yang · Stephan Eckstein · Marcel Nutz · Stephan Mandt 🔗 |
-
|
NNCodec: An Open Source Software Implementation of the Neural Network Coding ISO/IEC Standard
(
Poster
)
link »
This paper presents NNCodec, the first open source and standard-compliant implementation of the Neural Network Coding (NNC) standard (ISO/IEC 15938-17), and describes its software architecture and main coding tools. For this, the underlying distributions and information content of neural network weight parameters is analyzed and examined towards higher compression gains.At the core of the coding engine is a context-adaptive arithmetic coder that adapts its binary probability models on-the-fly to weight statistics. We show that NNCodec achieves higher compression than Huffman code, that is commonly used for neural network compression, but also that the average codeword length of NNCodec is often below the Shannon entropy bound.By introducing specifically trained local scaling parameters, NNCodec can compensate for quantization errors in the latent weight space to a certain degree, which we show experimentally for ResNets, EfficientNet, and a Vision Transformer network topology. The software and demo are available at https://github.com/[DOUBLE-BLIND-LINK]. |
Daniel Becking · Paul Haase · Heiner Kirchhoffer · Karsten Müller · Wojciech Samek 🔗 |
-
|
Practical Random Tree Generation using Spanning Trees: Entropy and Compression
(
Poster
)
link »
Tree structures make an appearance in many learning-related problems, most importantly in Graph Neural Networks. Modeling and simulating the appearance of these data structures can be done using random tree generators. However, there has been very little study on random models that are able to capture the dynamics of networks. We introduce the random spanning tree model, which is a random tree generator that is based on generating a tree from an already existing network topology. The Shannon entropy of this model is then analysed, and upper bounds to it are found. As compression can be beneficial because of the complexity of large trees, we then introduce a universal approach to compressing trees generated using the spanning tree model. It will be shown that the proposed method of compression introduces a redundancy that tends to zero for larger trees. Virtual talk: https://drive.google.com/file/d/1ZnivpgeQss0ftLPjb2TNtlbAZQH6EmbN/view?usp=drive_link |
Amirmohammad Farzaneh 🔗 |
-
|
Minimal Random Code Learning with Mean-KL Parameterization
(
Poster
)
link »
This paper studies the qualitative behavior and robustness of two variants of Minimal Random Code Learning (MIRACLE) used to compress variational Bayesian neural networks. MIRACLE implements a powerful, conditionally Gaussian variational approximation for the weight posterior $Q_{\mathbf{w}}$ and uses relative entropy coding to compress a weight sample from the posterior using a Gaussian coding distribution $P_{\mathbf{w}}$. To achieve the desired compression rate, $D_{\mathrm{KL}}[Q_{\mathbf{w}} \Vert P_{\mathbf{w}}]$ must be constrained, which requires a computationally expensive annealing procedure under the conventional mean-variance (Mean-Var) parameterization for $Q_{\mathbf{w}}$. Instead, we parameterize $Q_{\mathbf{w}}$ by its mean and KL divergence from $P_{\mathbf{w}}$ to constrain the compression cost to the desired value by construction. We demonstrate that variational training with Mean-KL parameterization converges twice as fast and maintains predictive performance after compression. Furthermore, we show that Mean-KL leads to more meaningful variational distributions with heavier tails and compressed weight samples which are more robust to pruning.
|
Jihao Andreas Lin · Gergely Flamich · Jose Miguel Hernandez-Lobato 🔗 |
-
|
Slicing Mutual Information Generalization Bounds for Neural Networks
(
Poster
)
link »
The ability of machine learning (ML) algorithms to generalize well to unseen data has been studied through the lens of information theory, by bounding the generalization error with the input-output mutual information (MI), i.e. the MI between the training data and the learned hypothesis. These bounds have limited empirical use for modern ML applications (e.g. deep learning) since the evaluation of MI is difficult in high-dimensional settings. Motivated by recent reports of significant low-loss compressibility of neural networks, we study the generalization capacity of algorithms which *slice* the parameter space, i.e. train on a random lower-dimensional subspace. We derive information-theoretic bounds on the generalization error in this regime, and discuss an intriguing connection to the $k$-Sliced Mutual Information, an alternative measure of statistical dependence which scales well with dimension. The computational and statistical benefits of our approach allow us to empirically estimate the input-output information of these neural networks and compute their information-theoretic generalization bounds, a task which was previously out of reach.
|
Kimia Nadjahi · Kristjan Greenewald · Rickard Gabrielsson · Justin Solomon 🔗 |
-
|
Task-aware Distributed Source Coding under Dynamic Bandwidth
(
Poster
)
link »
Efficient compression of correlated data is essential to minimize communication overload in multi-sensor networks. Each sensor independently compresses the data and transmits them to a central node due to limited bandwidth.A decoder at the central node decompresses and passes the data to a pre-trained machine learning-based task to generate the final output. Thus, it is important to compress the features that are relevant to the task.Additionally, the final performance depends heavily on the total available bandwidth. In practice, it is common to encounter varying availability in bandwidth, and higher bandwidth results in better performance of the task.We design a novel distributed compression framework composed of independent encoders and a joint decoder, which we call neural distributed principal component analysis (NDPCA). NDPCA flexibly compresses data from multiple sources to any available bandwidth with a single model, reducing computing and storage overhead. NDPCA achieves this by learning low-rank task representations and efficiently distributing bandwidth among sensors, thus providing a graceful trade-off between performance and bandwidth. Experiments show that NDPCA improves the accuracy of object detection tasks on satellite imagery by 14% compared to an autoencoder with uniform bandwidth allocation. |
Po-han Li · Sravan Kumar Ankireddy · Ruihan Zhao · Hossein Nourkhiz Mahjoub · Ehsan Moradi Pari · Ufuk Topcu · Sandeep Chinchali · Hyeji Kim 🔗 |
-
|
Text + Sketch: Image Compression at Ultra Low Rates
(
Poster
)
link »
Recent advances in text-to-image generative models provide the ability to generate high-quality images from short text descriptions. These foundation models, when pre-trained on billion-scale datasets, are effective for various downstream tasks with little or no further training. A natural question to ask is how such models may be adapted for image compression. We investigate several techniques in which the pre-trained models can be directly used to implement compression schemes targeting novel low rate regimes. We show how text descriptions can be used in conjunction with side information to generate high-fidelity reconstructions that preserve both semantics and spatial structure of the original. We demonstrate that at very low bit-rates, our method can significantly improve upon learned compressors in terms of perceptual and semantic fidelity, despite no end-to-end training. |
Eric Lei · Yigit Berkay Uslu · Hamed Hassani · Shirin Bidokhti 🔗 |
-
|
On the Choice of Perception Loss Function for Learned Video Compression
(
Poster
)
link »
We study causal, low-latency, sequential video compression when the output is subjected to both a mean squared-error (MSE) distortion loss as well as a perception loss to target realism. Motivated by prior approaches, we consider two different perception loss functions (PLFs). The first, PLF-JD, considers the joint distribution (JD) of all the video frames up to the current one, while the second metric, PLF-FMD, considers the framewise marginal distributions (FMD) between the source and reconstruction. Using deep-learning based experiments, we demonstrate that the choice of PLF can have a significant effect on the reconstruction, especially at low-bit rates. In particular, while the reconstruction based on PLF-JD can better preserve the temporal correlation across frames, it also imposes a significant penalty in distortion compared to PLF-FMD and further makes it more difficult to recover from errors made in the earlier output frames. We also demonstrate that encoded representations generated by training a system to minimize the MSE (without requiring either PLF) can be transformed to a reconstruction satisfying the perfect perceptual quality based on FMD by changing the distortion at most with a factor of two. A similar argument holds for the PLF-JD for a class of encoders operating at low-rate regime. We validate our results using information-theoretic analysis and deep-learning based experiments on moving MNIST and KTH datasets. |
Buu Phan · Sadaf Salehkalaibar · Jun Chen · Wei Yu · Ashish Khisti 🔗 |
-
|
Designing Discontinuities
(
Poster
)
link »
Discontinuities can be fairly arbitrary but also cause a significant impact on outcomes in social systems. Indeed, their arbitrariness is why they have been used to infer causal relationships among variables in numerous settings. Regression discontinuity from econometrics assumes the existence of a discontinuous variable that splits the population into distinct partitions to estimate causal effects. Here we consider the \emph{design} of partitions for a given discontinuous variable to optimize a certain effect. To do so, we propose a quantization-theoretic approach to optimize the effect of interest, first learning the causal effect size of a given discontinuous variable and then applying dynamic programming for optimal quantization design of discontinuities that balance the gain and loss in the effect size. We also develop a computationally-efficient reinforcement learning algorithm for the dynamic programming formulation of optimal quantization. We demonstrate our approach by designing optimal time zone borders for counterfactuals of social capital. Virtual talk: https://drive.google.com/file/d/1009OWQnHKoZk0m3e39hn51uvRX88y671/view?usp=drive_link |
Ibtihal Ferwana · Suyong Park · Ting-Yi Wu · Lav Varshney 🔗 |
-
|
Neural Image Compression: Generalization, Robustness, and Spectral Biases
(
Poster
)
link »
Recent neural image compression (NIC) advances have produced models which are starting to outperform traditional codecs. While this has led to growing excitement about using NIC in real-world applications, the successful adoption of any machine learning system in the wild requires it to generalize (and be robust) to unseen distribution shifts at deployment. Unfortunately, current research lacks comprehensive datasets and informative tools to evaluate and understand NIC performance in real-world settings. To bridge this crucial gap, we provide a comprehensive benchmark suite to evaluate the out-of-distribution (OOD) performance of image compression methods and propose spectrally inspired inspection tools to gain deeper insight into errors introduced by image compression methods as well as their OOD performance. We then carry out a detailed performance comparison of a classical codec with NIC variants, revealing intriguing findings that challenge our current understanding of NIC. |
Kelsey Lieberman · James Diffenderfer · Charles Godfrey · Bhavya Kailkhura 🔗 |
-
|
Siamese SIREN: Audio Compression with Implicit Neural Representations
(
Poster
)
link »
Implicit Neural Representations (INRs) have emerged as a promising method for representing diverse data modalities, including 3D shapes, images, and audio. While recent research has demonstrated successful applications of INRs in image and 3D shape compression, their potential for audio compression remains unexplored. Motivated by this, we present a preliminary investigation into the use of INRs for audio compression. Our study introduces Siamese SIREN, a novel approach based on the popular SIREN architecture. Our experimental results indicate that Siamese SIREN achieves superior audio reconstruction fidelity while utilizing fewer network parameters compared to previous INR architectures. |
Luca Lanzendörfer · Roger Wattenhofer 🔗 |
-
|
ICE-Pick: Iterative Cost-Efficient Pruning for DNNs
(
Poster
)
link »
Pruning is one of the main compression methods for Deep Neural Networks (DNNs), where less relevant parameters are removed from the model to reduce its memory footprint. To get better final accuracy, pruning is often performed iteratively, with increasing amounts of parameters being removed in each step, and fine-tuning (i.e., additional training epochs) being applied to the remaining parameters. However, this process can be very time-consuming, since the finetuning process is applied after every pruning step, and calculates gradients for the whole model. Motivated by these overheads, in this paper we propose ICE-Pick, a novel threshold-guided finetuning method, which freezes less sensitive layers, and leverages a custom pruning-aware learning rate scheduler. We evaluate our technique using ResNet-110, ResNet-152, and MobileNetV2 (defined for CIFAR-10), and show that ICE-Pick can save up to 87.6% of the pruning time while maintaining accuracy. Virtual talk: https://drive.google.com/file/d/1TmmRBfNXz-5hLSq6UyNqfakAw7YcE7NO/view?usp=drive_link |
Wenhao Hu · Perry Gibson · José Cano 🔗 |
-
|
Learn From One Specialized Sub-Teacher: One-to-One Mapping for Feature-Based Knowledge Distillation
(
Poster
)
link »
Knowledge Distillation is known as an effective technique to compress over-parameterized language models. In this work, we propose to break down the global feature distillation task into N local sub-tasks. In this new framework, we consider each neuron in the last hidden layer of the teacher network as a specialized sub-teacher. We also consider each neuron in the last hidden layer of the student network as a focused sub-student. We make each focused sub-student learn from one corresponding specialized sub-teacher and ignore the others. This will facilitate the task for the sub-student and keep him focused. This method is novel and can be combined with other distillation techniques. Empirical results show that our proposed approach outperforms the state-of-the-art methods by maintaining higher performance on most benchmark datasets. |
Khouloud Saadi · Jelena Mitrović · Michael Granitzer 🔗 |
-
|
Entropy Coding of Unordered Data Structures
(
Poster
)
link »
We present shuffle coding, a general method for optimal compression of sequences of unordered objects using bits-back coding. Data structures that can be compressed using shuffle coding include multisets, graphs, hypergraphs, and others. We demonstrate that the method achieves state-of-the-art compression rates on a range of graph datasets including molecular data, and release an implementation that can easily be adapted to different data types and statistical models. |
Julius Kunze · Daniel Severo · Giulio Zani · Jan-Willem van de Meent · James Townsend 🔗 |
-
|
Autoencoding Implicit Neural Representations for Image Compression
(
Poster
)
link »
Implicit Neural Representations (INRs) are increasingly popular methods for representing a variety of signals (Sitzmann et al., 2020b; Park et al., 2019; Mildenhall et al., 2021). Given their advantages over traditional signal representations, there are strong incentives to leverage them for signal compression. Here we focus on image compression, where recent INR-based approaches learn a base INR network shared across images, and infer/quantize a latent representation for each image in a second stage (Dupont et al., 2022; Schwarz &Teh, 2022; Schwarz et al., 2023). In this work, we view these approaches as special cases of nonlinear transform coding (NTC), and instead propose an end-to-end approach directly optimized for rate-distortion (R-D) performance. We essentially perform NTC with an INR-based decoder, achieving significantly faster training and improved R-D performance, although still falling short of that of state-of-the-art NTC approaches. By viewing an INR base network as a convolutional decoder with 1x1 convolutions, we can also better understand its inferior R-D performance through this inherent architectural constraint. |
Tuan Pham · Yibo Yang · Stephan Mandt 🔗 |
-
|
Neural Network Optimization with Weight Evolution
(
Poster
)
link »
In contrast to magnitude pruning, which only checks the parameter values at the end of training and removes the insignificant ones, this paper introduces a new approach that estimates the importance of each parameter in a holistic way. The proposed method keeps track of the parameter values from the beginning until the last epoch and calculates a weighted average across the training, giving more weight to the parameter values closer to the completion of training. We have tested this method on popular deep neural networks like AlexNet, VGGNet, ResNet and DenseNet on benchmark datasets like CIFAR10 and Tiny ImageNet. The results show that our approach can achieve higher compression with less loss of accuracy compared to magnitude pruning. |
Samir Belhaouari · Ashhadul Islam 🔗 |
-
|
EntropyRank: Unsupervised Keyphrase Extraction via Side-Information Optimization for Language Model-based Text Compression
(
Poster
)
link »
We propose an unsupervised method to extract keywords and keyphrases from texts based on a pre-trained language model (LM) and Shannon's information maximization. Specifically, our method extracts phrases having the highest conditional entropy under the LM. The resulting set of keyphrases turns out to solve a relevant information-theoretic problem: if provided as side information, it leads to the expected minimal binary code length in compressing the text using the LM and an entropy encoder. Alternately, the resulting set is an approximation via a causal LM to the set of phrases that minimize the entropy of the text when conditioned upon it. Empirically, the method provides results comparable to the most commonly used methods in various keyphrase extraction benchmark challenges. |
Alexander Tsvetkov · Alon Kipnis 🔗 |
-
|
Revisiting Associative Compression: I Can't Believe It's Not Better
(
Poster
)
link »
Typically, unordered image datasets are individually and sequentially compressed in random order. Unfortunately, general set compression methods that improve over the default sequential treatment yield only small rate gains for high-dimensional objects such as images. We propose an approach for compressing image datasets by using an image-to-image conditional generative model on a reordered dataset. Our approach is inspired by Associative Compression Networks (Graves et al., 2018). Even though this variation of variational auto-encoders was primarily developed for representation learning, the authors of the paper show substantial gains in the lossless compression of latent variables. We apply the core idea of the aforementioned work; adapting the generative prior to a previously seen neighbor image, to a commonly used neural compression model; the mean-scale hyperprior model (MSHP) (Ball ´e et al., 2018; Minnen et al., 2018). However, the architecture changes we propose here are applicable to other methods such as ELIC (He et al., 2022) as well. We train our model on subsets of an ordered version of Imagenet, and report rate-distortion curves on the same dataset. Unfortunately, we only see gains in latent space. Hence we speculate as to the reason why the approach is not leading to more significant improvements. |
Winnie Xu · Matthew Muckley · Yann Dubois · Karen Ullrich 🔗 |
-
|
Neural Image Compression with Quantization Rectifier
(
Poster
)
link »
Neural image compression has been shown to outperform traditional image codecs in terms of rate-distortion performance. However, quantization introduces errors in the compression process, which can degrade the quality of the compressed image. Existing approaches address the train-test mismatch problem incurred during quantization, the random impact of quantization on the expressiveness of image features is still unsolved. This paper presents a novel quantization rectifier (QR) method for image compression that leverages image feature correlation to mitigate the impact of quantization. Our method designs a neural network architecture that predicts unquantized features from the quantized ones, preserving feature expressiveness for better image reconstruction quality. We develop a soft-to-predictive training technique to integrate QR into existing neural image codecs. In evaluation, we integrate QR into state-of-the-art neural image codecs and compare enhanced models and baselines on the widely-used Kodak benchmark. The results show consistent coding efficiency improvement by QR with a negligible increase in the running time. Virtual talk: https://drive.google.com/file/d/1vqqKlIk7uMiRBYeaHEIbqxP9-NBR-CrW/view?usp=drive_link |
Wei Luo · Bo Chen 🔗 |
-
|
Fast Autoregressive Bit Sequence Modeling for Lossless Compression
(
Poster
)
link »
Autoregressive probability estimation of data sequences is a fundamental task in deep neural networks and has been widely used in applications such as lossless data compression. Since it is a sequential iterative process due to causality, there is a problem that its process is slow. In this paper, we propose Scale Causal Blocks (SCBs), which are basic components of deep neural networks that aim to significantly reduce the computational and memory cost compared to conventional techniques. Evaluation results show that the proposed method is one order of magnitude faster than a conventional computationally optimized Transformer-based method while maintaining comparable accuracy. |
Hiroaki Akutsu · Ko Arai 🔗 |
-
|
Lightweighted Sparse Autoencoder based on Explanable Contribution
(
Poster
)
link »
As deep learning models become heavier, developing lightweight models with the least performance degradation is paramount. In this paper, we propose an algorithm, SHAP-SAE (SHapley Additive exPlanations based Sparse AutoEncoder), that can explicitly measure the contribution of units and links and selectively activate only important units and links, leading to a lightweight sparse autoencoder. This allows us to explain how and why the sparse autoencoder is structured. We show that the SHAP-SAE outperforms other algorithms including a dense autoencoder. It is also confirmed that the SHAP-SAE is robust against the harsh sparsity of the autoencoder, as it shows remarkably limited performance degradation even with high sparsity levels. |
Joohong Rheey · Hyunggon Park 🔗 |
-
|
On the Maximum Mutual Information Capacity of Neural Architectures
(
Poster
)
link »
We derive the closed-form expression of the maximum mutual information - the maximum value of $I(X;Z)$ obtainable via training - for a broad family of neural network architectures. The quantity is essential to several branches of machine learning theory and practice. Quantitatively, we show that the maximum mutual information for these families all stem from generalizations of a single catch-all formula. Qualitatively, we show that the maximum mutual information of an architecture is most strongly influenced by the width of the smallest layer of the network - the ``information bottleneck'' in a different sense of the phrase, and by any statistical invariances captured by the architecture.
|
Brandon J Foggo · Nanpeng Yu 🔗 |
-
|
Neural Distributed Compressor Does Binning
(
Poster
)
link »
We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem in information theory, is a special case of distributed source coding. To this day, real-world applications of this problem have neither been fully developed nor heavily investigated. We find that our neural network-based compression scheme re-discovers some principles of the optimum theoretical solution of the Wyner-Ziv setup, such as binning in the source space as well as linear decoder behavior within each quantization index, for the quadratic-Gaussian case. Binning is a widely used tool in information theoretic proofs and methods, and to our knowledge, this is the first time it has been explicitly observed to emerge from data-driven learning. |
Ezgi Ozyilkan · Johannes Ballé · Elza Erkip 🔗 |
-
|
Less-Energy-Usage Network with Batch Power Iteration
(
Poster
)
link »
Large scale neural networks are among the mainstream tools of modern big data analytics. But their training and inference phase are accompanied by huge energy consumption and carbon footprint. The energy efficiency, running time complexity and model storage size are three major considerations of using deep neural networks in modern applications. Here we introduce Less-Energy-Usage Network, or LEAN. Different from classic network compression (e.g. pruning and knowledge distillation) that transform a pre-trained huge network to a smaller network, our method is to build a lean and effective network during training phase. It is based on spectral theory and batch power iteration learning. This technique can be applied to almost any type of neural networks to reduce their sizes. Preliminary experiment results show that our LEAN consumes 30% less energy, while achieves 95% of the baseline accuracy with 1.5x speed-up and up to 90% less parameters compared against the baseline CNN model. |
Hao Huang · Tapan Shah · Shinjae Yoo · Scott Evans 🔗 |
-
|
MLIC$^{++}$: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression
(
Poster
)
link »
Recently, multi-reference entropy model has been proposed, which captures channel-wise, local spatial, and global spatial correlations. Previous works adopt attention for global correlation capturing, however, the quadratic cpmplexity limits the potential of high-resolution image coding. In this paper, we propose the linear complexity global correlations capturing, via the decomposition of softmax operation. Based on it, we propose the MLIC$^{++}$, a learned image compression with linear complexity for multi-reference entropy modeling. Our MLIC$^{++}$ is more efficient and it reduces BD-rate by $12.44$% on the Kodak dataset compared to VTM-17.0 when measured in PSNR.
|
Wei Jiang · Ronggang Wang 🔗 |
-
|
Making Text-Image Connection Formal and Practical
(
Poster
)
link »
Text and image feature extraction is at the core of several state-of-the-art artificial intelligence algorithms, including DALLE-2, Stable Diffusion, and Segment Anything. However, models that connect images and texts are usually trained using hundreds of GPUs and millions of data points, making it infeasible for most agents to perform the training from scratches. Furthermore, these groundbreaking works necessitate more formally defined algorithms to enable easier adoption and implementation. To address these issues, this paper elaborates on a formal and intuitive algorithm for text-image connections and proposes an alternative to train CLIP, a neural network model that learns joint representations from text and images, on low computing resources. Our focus is on improving training speed and using a fraction of the data. In our experimentation, two models were trained on a third of WKIT-24M, a dataset of text-image pairs, by making use of mixed precision in back-propagation and shrinking the resolution of input images while also shrinking the maximum length of the query in comparison to the original CLIP in a setting constrained to a single GPU. Our results show that it is feasible to train image-text connection models from scratches in a simplified setting in recognizing related image concepts. Virtual talk: https://drive.google.com/file/d/1tdjchYTMkeOVnveCiT1d8JVhNBIwU-cz/view?usp=drive_link |
Carlos-Gustavo Salas-Flores · Dongmian Zou · Luyao Zhang 🔗 |
-
|
Are Visual Recognition Models Robust to Image Compression?
(
Poster
)
link »
Reducing the data footprint of visual content via image compression is essential to reduce storage requirements, but also to reduce the bandwidth and latency requirements for transmission. In particular, the use of compressed images allows for faster transfer of data, and faster response times for visual recognition in edge devices that rely on cloud-based services. In this paper, we first analyze the impact of image compression using traditional codecs, as well as recent state-of-the-art neural compression approaches, on three visual recognition tasks: image classification, object detection, and semantic segmentation. We consider a wide range of compression levels, ranging from 0.1 to 2 bits-per-pixel (bpp). We find that for all three tasks, the recognition ability is significantly impacted when using strong compression. For example, for segmentation mIoU is reduced from 44.5 to 30.5 mIoU when compressing to 0.1 bpp using the best compression model we evaluated. Second, we test to what extent this performance drop can be ascribed to a loss of relevant information in the compressed image, or to a lack of generalization of visual recognition models to images with compression artefacts. We find that to a large extent the performance loss is due to the latter: by finetuning the recognition models on compressed training images, most of the performance loss is recovered. For example, bringing segmentation accuracy back up to 42 mIoU, i.e. recovering 82% of the original drop in accuracy. |
João Maria Janeiro · Stanislav Frolov · Alaaeldin El-Nouby · Jakob Verbeek 🔗 |
Author Information
Berivan Isik (Stanford University)
Yibo Yang (University of California, Irivine)
Daniel Severo (University of Toronto Vector Institute for AI)
Karen Ullrich (Meta AI)
Robert Bamler (University of Tübingen)
Stephan Mandt (University of California, Irivine)
Stephan Mandt is an Assistant Professor of Computer Science at the University of California, Irvine. From 2016 until 2018, he was a Senior Researcher and head of the statistical machine learning group at Disney Research, first in Pittsburgh and later in Los Angeles. He held previous postdoctoral positions at Columbia University and at Princeton University. Stephan holds a PhD in Theoretical Physics from the University of Cologne. He is a Fellow of the German National Merit Foundation, a Kavli Fellow of the U.S. National Academy of Sciences, and was a visiting researcher at Google Brain. Stephan serves regularly as an Area Chair for NeurIPS, ICML, AAAI, and ICLR, and is a member of the Editorial Board of JMLR. His research is currently supported by NSF, DARPA, IBM, and Qualcomm.
More from the Same Authors
-
2023 : Exact Optimality in Communication-Privacy-Utility Tradeoffs »
Berivan Isik · Wei-Ning Chen · Ayfer Ozgur · Tsachy Weissman · Albert No -
2023 : Leveraging Side Information for Communication-Efficient Federated Learning »
Berivan Isik · Francesco Pase · Deniz Gunduz · Sanmi Koyejo · Tsachy Weissman · Michele Zorzi -
2023 : GPT-Zip: Deep Compression of Finetuned Large Language Models »
Berivan Isik · Hermann Kumbong · Wanyi Ning · Xiaozhe Yao · Sanmi Koyejo · Ce Zhang -
2023 : Lossy Image Compression with Conditional Diffusion Model »
Ruihan Yang · Stephan Mandt -
2023 : Estimating the Rate-Distortion Function by Wasserstein Gradient Descent »
Yibo Yang · Stephan Eckstein · Marcel Nutz · Stephan Mandt -
2023 : Autoencoding Implicit Neural Representations for Image Compression »
Tuan Pham · Yibo Yang · Stephan Mandt -
2023 : Invited Talk by Karen Ullrich »
Karen Ullrich -
2023 Poster: Deep Anomaly Detection under Labeling Budget Constraints »
Aodong Li · Chen Qiu · Marius Kloft · Padhraic Smyth · Stephan Mandt · Maja Rudolph -
2023 Poster: Fully Bayesian Autoencoders with Latent Sparse Gaussian Processes »
Ba-Hien Tran · Babak Shahbaba · Stephan Mandt · Maurizio Filippone -
2022 Poster: Structured Stochastic Gradient MCMC »
Antonios Alexos · Alex Boyd · Stephan Mandt -
2022 Spotlight: Structured Stochastic Gradient MCMC »
Antonios Alexos · Alex Boyd · Stephan Mandt -
2022 Poster: Latent Outlier Exposure for Anomaly Detection with Contaminated Data »
Chen Qiu · Aodong Li · Marius Kloft · Maja Rudolph · Stephan Mandt -
2022 Spotlight: Latent Outlier Exposure for Anomaly Detection with Contaminated Data »
Chen Qiu · Aodong Li · Marius Kloft · Maja Rudolph · Stephan Mandt -
2021 Workshop: Information-Theoretic Methods for Rigorous, Responsible, and Reliable Machine Learning (ITR3) »
Ahmad Beirami · Flavio Calmon · Berivan Isik · Haewon Jeong · Matthew Nokleby · Cynthia Rush -
2021 Poster: Neural Transformation Learning for Deep Anomaly Detection Beyond Images »
Chen Qiu · Timo Pfrommer · Marius Kloft · Stephan Mandt · Maja Rudolph -
2021 Spotlight: Neural Transformation Learning for Deep Anomaly Detection Beyond Images »
Chen Qiu · Timo Pfrommer · Marius Kloft · Stephan Mandt · Maja Rudolph -
2021 Poster: Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding »
Yangjun Ruan · Karen Ullrich · Daniel Severo · James Townsend · Ashish Khisti · Arnaud Doucet · Alireza Makhzani · Chris Maddison -
2021 Affinity Workshop: Women in Machine Learning (WiML) Un-Workshop »
Wenshuo Guo · Beliz Gokkaya · Arushi G K Majha · Vaidheeswaran Archana · Berivan Isik · Olivia Choudhury · Liyue Shen · Hadia Samil · Tatjana Chavdarova -
2021 Oral: Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding »
Yangjun Ruan · Karen Ullrich · Daniel Severo · James Townsend · Ashish Khisti · Arnaud Doucet · Alireza Makhzani · Chris Maddison -
2020 Poster: The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks »
Jakub Swiatkowski · Kevin Roth · Bastiaan Veeling · Linh Tran · Joshua V Dillon · Jasper Snoek · Stephan Mandt · Tim Salimans · Rodolphe Jenatton · Sebastian Nowozin -
2020 Poster: How Good is the Bayes Posterior in Deep Neural Networks Really? »
Florian Wenzel · Kevin Roth · Bastiaan Veeling · Jakub Swiatkowski · Linh Tran · Stephan Mandt · Jasper Snoek · Tim Salimans · Rodolphe Jenatton · Sebastian Nowozin -
2020 Poster: Variational Bayesian Quantization »
Yibo Yang · Robert Bamler · Stephan Mandt -
2018 Poster: Iterative Amortized Inference »
Joe Marino · Yisong Yue · Stephan Mandt -
2018 Poster: Disentangled Sequential Autoencoder »
Yingzhen Li · Stephan Mandt -
2018 Oral: Disentangled Sequential Autoencoder »
Yingzhen Li · Stephan Mandt -
2018 Oral: Iterative Amortized Inference »
Joe Marino · Yisong Yue · Stephan Mandt -
2018 Poster: Quasi-Monte Carlo Variational Inference »
Alexander Buchholz · Florian Wenzel · Stephan Mandt -
2018 Poster: Improving Optimization in Models With Continuous Symmetry Breaking »
Robert Bamler · Stephan Mandt -
2018 Oral: Quasi-Monte Carlo Variational Inference »
Alexander Buchholz · Florian Wenzel · Stephan Mandt -
2018 Oral: Improving Optimization in Models With Continuous Symmetry Breaking »
Robert Bamler · Stephan Mandt -
2017 Poster: Dynamic Word Embeddings »
Robert Bamler · Stephan Mandt -
2017 Talk: Dynamic Word Embeddings »
Robert Bamler · Stephan Mandt