# Downloads 2024

Number of events: 2681

- $\bf{\Phi}_\textrm{Flow}$: Differentiable Simulations for PyTorch, TensorFlow and Jax
- $f$-Divergence Based Classification: Beyond the Use of Cross-Entropy
- $H$-Consistency Guarantees for Regression
- $\mathtt{VITS}$ : Variational Inference Thompson Sampling for contextual bandits
- ${\rm E}(3)$-Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning
- $S^2$IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting
- $\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
- 1st ICML Workshop on In-Context Learning (ICL @ ICML 2024)
- 2nd Workshop on Advancing Neural Network Training : Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)
- 2nd Workshop on Generative AI and Law (GenLaw ’24)
- 3D Geometric Shape Assembly via Efficient Point Cloud Matching
- 3D-VLA: A 3D Vision-Language-Action Generative World Model
- A2Q+: Improving Accumulator-Aware Weight Quantization
- A3S: A General Active Clustering Method with Pairwise Constraints
- A Bayesian Approach to Online Planning
- A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models
- Absolute Policy Optimization: Enhancing Lower Probability Bound of Performance with High Confidence
- Accelerated Algorithms for Constrained Nonconvex-Nonconcave Min-Max Optimization and Comonotone Inclusion
- Accelerated Policy Gradient for s-rectangular Robust MDPs with Large State Spaces
- Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning
- Accelerated Speculative Sampling Based on Tree Monte Carlo
- Accelerating Convergence in Bayesian Few-Shot Classification
- Accelerating Convergence of Score-Based Diffusion Models, Provably
- Accelerating Federated Learning with Quick Distributed Mean Estimation
- Accelerating Heterogeneous Federated Learning with Closed-form Classifiers
- Accelerating Iterative Retrieval-augmented Language Model Serving with Speculation
- Accelerating Legacy Numerical Solvers by Non-intrusive Gradient-based Meta-solving
- Accelerating Look-ahead in Bayesian Optimization: Multilevel Monte Carlo is All you Need
- Accelerating Parallel Sampling of Diffusion Models
- Accelerating PDE Data Generation via Differential Operator Action in Solution Space
- Accelerating Transformer Pre-training with 2:4 Sparsity
- Accessible and Efficient Foundation Models for Biological Discovery
- Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
- ACE: Off-Policy Actor-Critic with Causality-Aware Entropy Regularization
- Achieving Lossless Gradient Sparsification via Mapping to Alternative Space in Federated Learning
- Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling
- A Circuit Domain Generalization Framework for Efficient Logic Synthesis in Chip Design
- A Closer Look at the Limitations of Instruction Tuning
- ACM-MILP: Adaptive Constraint Modification via Grouping and Selection for Hardness-Preserving MILP Instance Generation
- A Computational Framework for Solving Wasserstein Lagrangian Flows
- A connection between Tempering and Entropic Mirror Descent
- A Contextual Combinatorial Bandit Approach to Negotiation
- ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
- Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts
- Acquisition Conditioned Oracle for Nongreedy Active Feature Acquisition
- Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
- Activation-Descent Regularization for Input Optimization of ReLU Networks
- Active Adaptive Experimental Design for Treatment Effect Estimation with Covariate Choice
- Active Label Correction for Semantic Segmentation with Foundation Models
- Active Preference Learning for Large Language Models
- Active Ranking and Matchmaking, with Perfect Matchings
- Active Statistical Inference
- AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors
- Adapt and Diffuse: Sample-adaptive Reconstruction via Latent Diffusion Models
- Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control
- Adapting Static Fairness to Sequential Decision-Making: Bias Mitigation Strategies towards Equal Long-term Benefit Rate
- Adaptive Accompaniment with ReaLchords
- Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning
- Adaptive Conformal Inference by Betting
- Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity
- Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations
- Adaptive Group Personalization for Federated Mutual Transfer Learning
- Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing
- Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation
- Adaptive Learning of Density Ratios in RKHS
- Adaptively Learning to Select-Rank in Online Platforms
- Adaptively Perturbed Mirror Descent for Learning in Games
- Adaptive Observation Cost Control for Variational Quantum Eigensolvers
- Adaptive Online Experimental Design for Causal Discovery
- Adaptive Proximal Gradient Methods Are Universal Without Approximation
- Adaptive Robust Learning using Latent Bernoulli Variables
- Adaptive Sampling of k-Space in Magnetic Resonance for Rapid Pathology Prediction
- Adaptive Stabilization Based on Machine Learning for Column Generation
- Adaptive Text Watermark for Large Language Models
- A decoder-only foundation model for time-series forecasting
- A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
- A Differentiable Partially Observable Generalized Linear Model with Forward-Backward Message Passing
- A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization
- A Distributional Analogue to the Successor Representation
- A Doubly Recursive Stochastic Compositional Gradient Descent Method for Federated Multi-Level Compositional Optimization
- AdsorbDiff: Adsorbate Placement via Conditional Denoising Diffusion
- A Dual-module Framework for Counterfactual Estimation over Time
- Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment
- Advancing Dynamic Sparse Training by Exploring Optimization Opportunities
- Adversarial Attacks on Combinatorial Multi-Armed Bandits
- Adversarially Robust Deep Multi-View Clustering: A Novel Attack and Defense Framework
- Adversarially Robust Hypothesis Transfer Learning
- Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies
- A Dynamic Algorithm for Weighted Submodular Cover Problem
- A Dynamical Model of Neural Scaling Laws
- AegisFL: Efficient and Flexible Privacy-Preserving Byzantine-Robust Cross-silo Federated Learning
- A fast algorithm to simulate nonlinear resistive networks
- A Federated Stochastic Multi-level Compositional Minimax Algorithm for Deep AUC Maximization
- A Field Guide for Pacing Budget and ROS Constraints
- A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models
- A Fixed-Point Approach for Causal Generative Modeling
- A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks
- A General Framework for Learning from Weak Supervision
- A General Framework for Sequential Decision-Making under Adaptivity Constraints
- A General Online Algorithm for Optimizing Complex Performance Metrics
- A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
- A Generative Approach for Treatment Effect Estimation under Collider Bias: From an Out-of-Distribution Perspective
- Agentic Markets Workshop
- Agent Instructs Large Language Models to be General Zero-Shot Reasoners
- Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
- Agent-Specific Effects: A Causal Effect Propagation Analysis in Multi-Agent MDPs
- A Geometric Decomposition of Finite Games: Convergence vs. Recurrence under Exponential Weights
- A Geometric Explanation of the Likelihood OOD Detection Paradox
- A Global Geometric Analysis of Maximal Coding Rate Reduction
- Agnostic Interactive Imitation Learning: New Theory and Practical Algorithms
- Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms
- Agnostic Sample Compression Schemes for Regression
- A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer
- A Hierarchical Adaptive Multi-Task Reinforcement Learning Framework for Multiplier Circuit Design
- A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
- AI Alignment with Changing and Influenceable Reward Functions
- AI Control: Improving Safety Despite Intentional Subversion
- AI for Math Workshop
- AI for Science: Scaling in AI for Scientific Discovery
- Ai-sampler: Adversarial Learning of Markov kernels with involutive maps
- A Language Model’s Guide Through Latent Space
- ALERT-Transformer: Bridging Asynchronous and Synchronous Machine Learning for Real-Time Event-based Spatio-Temporal Data
- Algorithm and Hardness for Dynamic Attention Maintenance in Large Language Models
- Algorithmic Stability Unleashed: Generalization Bounds with Unbounded Losses
- Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
- Aligned Objective for Soft-Pseudo-Label Generation in Supervised Learning
- Aligning Reinforcement Learning Experimentalists and Theorists
- Aligning Transformers with Weisfeiler-Leman
- Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
- A Linear Time and Space Local Point Cloud Geometry Encoder via Vectorized Kernel Mixture (VecKM)
- All-in-one simulation-based inference
- Allocation Requires Prediction Only if Inequality Is Low
- AlphaFold Meets Flow Matching for Generating Protein Ensembles
- AlphaZero-Like Tree-Search can Guide Large Language Model Decoding and Training
- Ambiguity-Aware Abductive Learning
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
- Ameliorate Spurious Correlations in Dataset Condensation
- Amend to Alignment: Decoupled Prompt Tuning for Mitigating Spurious Correlation in Vision-Language Models
- A Minimaximalist Approach to Reinforcement Learning from Human Feedback
- Amortized Equation Discovery in Hybrid Dynamical Systems
- Amortized Variational Deep Kernel Learning
- Amortizing Pragmatic Program Synthesis with Rankings
- AMPA: Adaptive Mixed Precision Allocation for Low-Bit Integer Training
- A Multimodal Automated Interpretability Agent
- Analysis for Abductive Learning and Neural-Symbolic Reasoning Shortcuts
- Analyzing $D^\alpha$ seeding for $k$-means
- An amortized approach to non-linear mixed-effects modeling based on neural posterior estimation
- An Analysis of Linear Time Series Forecasting Models
- AND: Audio Network Dissection for Interpreting Deep Acoustic Models
- A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering
- A Nearly Optimal Single Loop Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness
- An Effective Dynamic Gradient Calibration Method for Continual Learning
- An Efficient Maximal Ancestral Graph Listing Algorithm
- An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems
- An Embodied Generalist Agent in 3D World
- An Empirical Examination of Balancing Strategy for Counterfactual Estimation on Time Series
- An Empirical Study Into What Matters for Calibrating Vision-Language Models
- An Empirical Study of Realized GNN Expressiveness
- A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data
- A Neural-Preconditioned Poisson Solver for Mixed Dirichlet and Neumann Boundary Conditions
- A New Branch-and-Bound Pruning Framework for $\ell_0$-Regularized Problems
- A New Computationally Efficient Algorithm to solve Feature Selection for Functional Data Classification in High-dimensional Spaces
- A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization
- A New Robust Partial p-Wasserstein-Based Metric for Comparing Distributions
- A New Theoretical Perspective on Data Heterogeneity in Federated Optimization
- An Explicit Frame Construction for Normalizing 3D Point Clouds
- An Image is Worth Multiple Words: Discovering Object Level Concepts using Multi-Concept Prompt Learning
- An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks
- An Independence-promoting Loss for Music Generation with Language Models
- An Infinite-Width Analysis on the Jacobian-Regularised Training of a Neural Network
- An Information-Theoretic Analysis of In-Context Learning
- An Information Theoretic Approach to Interaction-Grounded Learning
- An Interpretable Evaluation of Entropy-based Novelty of Generative Models
- An Intrinsic Vector Heat Network
- An Iterative Min-Min Optimization Method for Sparse Bayesian Learning
- An LLM Compiler for Parallel Function Calling
- An Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization
- Antibody Design Using a Score-based Diffusion Model Guided by Evolutionary, Physical and Geometric Constraints
- An Unsupervised Approach for Periodic Source Detection in Time Series
- Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
- AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
- A Persuasive Approach to Combating Misinformation
- Applying language models to algebraic topology: generating simplicial cycles using multi-labeling in Wu's formula
- Approximate Nearest Neighbor Search with Window Filters
- A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs
- A Probabilistic Approach to Learning the Degree of Equivariance in Steerable CNNs
- A Provable Decision Rule for Out-of-Distribution Detection
- A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts
- APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
- AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA
- A Rate-Distortion View of Uncertainty Quantification
- ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
- A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models
- Arrows of Time for Large Language Models
- ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations
- A sampling theory perspective on activations for implicit neural representations
- A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models
- A Single-Loop Robust Policy Gradient Method for Robust Markov Decision Processes
- A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?
- A Space Group Symmetry Informed Network for O(3) Equivariant Crystal Tensor Prediction
- A Sparsity Principle for Partially Observable Causal Representation Learning
- Assessing Large Language Models on Climate Information
- Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
- A Statistical Framework for Data-dependent Retrieval-Augmented Models
- A Statistical Theory of Regularization-Based Continual Learning
- AST-T5: Structure-Aware Pretraining for Code Generation and Understanding
- A Study of First-Order Methods with a Deterministic Relative-Error Gradient Oracle
- A Subquadratic Time Algorithm for Robust Sparse Mean Estimation
- Asymmetry in Low-Rank Adapters of Foundation Models
- Asymptotically Optimal and Computationally Efficient Average Treatment Effect Estimation in A/B testing
- Asymptotics of feature learning in two-layer networks after one gradient-step
- Asymptotics of Learning with Deep Structured (Random) Features
- A Tale of Tails: Model Collapse as a Change of Scaling Laws
- A Tensor Decomposition Perspective on Second-order RNNs
- A Theoretical Analysis of Backdoor Poisoning Attacks in Convolutional Neural Networks
- A Theory of Fault-Tolerant Learning
- A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks
- A Touch, Vision, and Language Dataset for Multimodal Alignment
- ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories
- Attack-free Evaluating and Enhancing Adversarial Robustness on Categorical Data
- Attention Meets Post-hoc Interpretability: A Mathematical Perspective
- AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
- AttNS: Attention-Inspired Numerical Solving For Limited Data Scenarios
- Attribute Based Interpretable Evaluation Metrics for Generative Models
- Attribution-based Explanations that Provide Recourse Cannot be Robust
- Auctionformer: A Unified Deep Learning Algorithm for Solving Equilibrium Strategies in Auction Games
- Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
- Auditing Private Prediction
- Augmenting Decision with Hypothesis in Reinforcement Learning
- A Unified Adaptive Testing System Enabled by Hierarchical Structure Search
- A Unified Framework for Learning with Nonlinear Model Classes from Arbitrary Linear Samples
- A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback
- A Unified Recipe for Deriving (Time-Uniform) PAC-Bayes Bounds
- A Unified View of FANOVA: A Comprehensive Bayesian Framework for Component Selection and Estimation
- A Universal Class of Sharpness-Aware Minimization Algorithms
- A Universal Transfer Theorem for Convex Optimization Algorithms Using Inexact First-order Oracles
- Autaptic Synaptic Circuit Enhances Spatio-temporal Predictive Learning of Spiking Neural Networks
- Autoencoding Conditional Neural Processes for Representation Learning
- Auto-Encoding Morph-Tokens for Multimodal LLM
- Autoformalizing Euclidean Geometry
- Auto-Linear Phenomenon in Subsurface Imaging
- Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation
- Automated Loss function Search for Class-imbalanced Node Classification
- Automated Reinforcement Learning: Exploring Meta-Learning, AutoML, and LLMs
- Automated Statistical Model Discovery with Language Models
- Automating the Selection of Proxy Variables of Unmeasured Confounders
- Autonomous Sparse Mean-CVaR Portfolio Optimization
- AutoOS: Make Your OS More Powerful by Exploiting Large Language Models
- Auto-Regressive Next-Token Predictors are Universal Learners
- Averaging $n$-step Returns Reduces Variance in Reinforcement Learning
- BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks
- BAGEL: Bootstrapping Agents by Guiding Exploration with Language
- Bagged Deep Image Prior for Recovering Images in the Presence of Speckle Noise
- Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance
- Balanced Resonate-and-Fire Neurons
- Balancing Feature Similarity and Label Variability for Optimal Size-Aware One-shot Subset Selection
- Balancing Similarity and Complementarity for Federated Learning
- Barrier Algorithms for Constrained Non-Convex Optimization
- Batch and match: black-box variational inference with a score-based divergence
- Batch Singular Value Polarization and Weighted Semantic Augmentation for Universal Domain Adaptation
- BAT: Learning to Reason about Spatial Sounds with Large Language Models
- Bayesian Adaptation of Network Depth and Width for Continual Learning
- Bayesian Design Principles for Offline-to-Online Reinforcement Learning
- Bayesian Exploration Networks
- Bayesian Knowledge Distillation: A Bayesian Perspective of Distillation with Uncertainty Quantification
- Bayesian Optimization of Function Networks with Partial Evaluations
- Bayesian Power Steering: An Effective Approach for Domain Adaptation of Diffusion Models
- Bayesian Program Learning by Decompiling Amortized Knowledge
- Bayesian Regret Minimization in Offline Bandits
- Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning
- BayOTIDE: Bayesian Online Multivariate Time Series Imputation with Functional Decomposition
- BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models
- BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation
- Behavior Generation with Latent Actions
- BeigeMaps: Behavioral Eigenmaps for Reinforcement Learning from Images
- Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT
- Benchmarking Deletion Metrics with the Principled Explanations
- Benign Overfitting in Adversarial Training of Neural Networks
- Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data
- Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models
- Best Arm Identification for Stochastic Rising Bandits
- Best of Both Worlds Guarantees for Smoothed Online Quadratic Optimization
- Better & Faster Large Language Models via Multi-token Prediction
- Better Locally Private Sparse Estimation Given Multiple Samples Per User
- Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks
- BetterV: Controlled Verilog Generation with Discriminative Guidance
- Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
- Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling
- Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
- Beyond Individual Input for Deep Anomaly Detection on Tabular Data
- Beyond Point Prediction: Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process
- Beyond Regular Grids: Fourier-Based Neural Operators on Arbitrary Domains
- Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models
- Beyond the Calibration Point: Mechanism Comparison in Differential Privacy
- Beyond the Federation: Topology-aware Federated Learning for Generalization to Unseen Clients
- Beyond the Norms: Detecting Prediction Errors in Regression Models
- Beyond the ROC Curve: Classification Trees Using Cost-Optimal Curves, with Application to Imbalanced Datasets
- Be Your Own Neighborhood: Detecting Adversarial Examples by the Neighborhood Relations Built on Self-Supervised Learning
- Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
- Bidirectional Reciprocative Information Communication for Few-Shot Semantic Segmentation
- BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization
- Bifurcated Attention for Single-Context Large-Batch Sampling
- Biharmonic Distance of Graphs and its Higher-Order Variants: Theoretical Properties with Applications to Centrality and Clustering
- BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
- Binary Decomposition: A Problem Transformation Perspective for Open-Set Semi-Supervised Learning
- Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains
- Bipartite Matching in Massive Graphs: A Tight Analysis of EDCS
- BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model
- Bivariate Causal Discovery using Bayesian Model Selection
- Block Acceleration Without Momentum: On Optimal Stepsizes of Block Gradient Descent for Least-Squares
- BLO-SAM: Bi-level Optimization Based Finetuning of the Segment Anything Model for Overfitting-Preventing Semantic Segmentation
- Boosting Offline Optimizers with Surrogate Sensitivity
- Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays
- Bootstrap AutoEncoders With Contrastive Paradigm for Self-supervised Gaze Estimation
- Bootstrapping Fisher Market Equilibrium and First-Price Pacing Equilibrium
- Borda Regret Minimization for Generalized Linear Dueling Bandits
- BOtied: Multi-objective Bayesian optimization with tied multivariate ranks
- Bottleneck-Minimal Indexing for Generative Document Retrieval
- Boundary Exploration for Bayesian Optimization With Unknown Physical Constraints
- Bounded and Uniform Energy-based Out-of-distribution Detection for Graphs
- Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data
- Box Facets and Cut Facets of Lifted Multicut Polytopes
- Boximator: Generating Rich and Controllable Motions for Video Synthesis
- BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback
- Breadth-First Exploration on Adaptive Grid for Reinforcement Learning
- Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents
- Breaking through the learning plateaus of in-context learning in Transformer
- Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
- Bridging Data Gaps in Diffusion Models with Adversarial Noise-Based Transfer Learning
- Bridging discrete and continuous state spaces: Exploring the Ehrenfest process in time-continuous diffusion models
- Bridging Environments and Language with Rendering Functions and Vision-Language Models
- Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses
- Bridging Model Heterogeneity in Federated Learning via Uncertainty-based Asymmetrical Reciprocity Learning
- Bringing Motion Taxonomies to Continuous Domains via GPLVM on Hyperbolic manifolds
- Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel
- Building Socially-Equitable Public Models
- BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges
- ByMI: Byzantine Machine Identification with False Discovery Rate Control
- By Tying Embeddings You Are Assuming the Distributional Hypothesis
- Byzantine Resilient and Fast Federated Few-Shot Learning
- Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates
- Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
- Calibration Bottleneck: Over-compressed Representations are Less Calibratable
- CaM: Cache Merging for Memory-efficient LLMs Inference
- Can a Few Decide for Many? The Metric Distortion of Sortition
- Can AI Assistants Know What They Don't Know?
- Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data
- Can Gaussian Sketching Converge Faster on a Preconditioned Landscape?
- Can Implicit Bias Imply Adversarial Robustness?
- Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
- Can Machines Learn the True Probabilities?
- Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks
- Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
- CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources
- CarbonNovo: Joint Design of Protein Structure and Sequence Using a Unified Energy-based Model
- Careful with that Scalpel: Improving Gradient Surgery with an EMA
- CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation Process
- CARTE: Pretraining and Transfer for Tabular Learning
- Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
- CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling
- Case-Based or Rule-Based: How Do Transformers Do the Math?
- Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
- Category-Aware Active Domain Adaptation
- CATS: Enhancing Multivariate Time Series Forecasting by Constructing Auxiliary Time Series as Exogenous Variables
- CauDiTS: Causal Disentangled Domain Adaptation of Multivariate Time Series
- Causal Action Influence Aware Counterfactual Data Augmentation
- Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals
- Causal Customer Churn Analysis with Low-rank Tensor Block Hazard Model
- Causal Discovery via Conditional Independence Testing with Proxy Variables
- Causal Discovery with Fewer Conditional Independence Tests
- Causal Effect Identification in LiNGAM Models with Latent Confounders
- Causal Inference from Competing Treatments
- Causal Inference out of Control: Estimating Performativity without Treatment Randomization
- Causal-IQA: Towards the Generalization of Image Quality Assessment Based on Causal Inference
- Causality Based Front-door Defense Against Backdoor Attack on Language Models
- Causally Motivated Personalized Federated Invariant Learning with Shortcut-Averse Information-Theoretic Regularization
- Causal Representation Learning from Multiple Distributions: A General Setting
- Causal Representation Learning Made Identifiable by Grouping of Observational Variables
- CCM: Real-Time Controllable Visual Content Creation Using Text-to-Image Consistency Models
- Cell2Sentence: Teaching Large Language Models the Language of Biology
- Centralized Selection with Preferences in the Presence of Biases
- Certifiably Byzantine-Robust Federated Conformal Prediction
- CF-OPT: Counterfactual Explanations for Structured Prediction
- CHAI: Clustered Head Attention for Efficient LLM Inference
- Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
- Chain-of-Thought Predictive Control
- Challenges and Considerations in the Evaluation of Bayesian Causal Discovery
- Challenges in Language Model Evaluations
- Challenges in Training PINNs: A Loss Landscape Perspective
- Characteristic Guidance: Non-linear Correction for Diffusion Model at Large Guidance Scale
- Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation
- Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum
- Characterizing ResNet's Universal Approximation Capability
- Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension
- Chasing Convex Functions with Long-term Constraints
- Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
- CHEMREASONER: Heuristic Search over a Large Language Model’s Knowledge Space using Quantum-Chemical Feedback
- CKGConv: General Graph Convolution with Continuous Kernels
- Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference
- Classification Under Strategic Self-Selection
- Class-Imbalanced Graph Learning without Class Rebalancing
- CLIF: Complementary Leaky Integrate-and-Fire Neuron for Spiking Neural Networks
- Clifford-Steerable Convolutional Neural Networks
- CLIPZyme: Reaction-Conditioned Virtual Screening of Enzymes
- CLLMs: Consistency Large Language Models
- Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization
- Cluster-Aware Similarity Diffusion for Instance Retrieval
- Clustered Federated Learning via Gradient-based Partitioning
- Coactive Learning for Large Language Models using Implicit User Feedback
- COALA: A Practical and Vision-Centric Federated Learning Platform
- Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models
- Coarse-To-Fine Tensor Trains for Compact Visual Representations
- Code as Reward: Empowering Reinforcement Learning with VLMs
- Codebook Features: Sparse and Discrete Interpretability for Neural Networks
- CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
- CogBench: a large language model walks into a psychology lab
- CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding
- COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
- Collaborative Heterogeneous Causal Inference Beyond Meta-analysis
- Collaborative Learning with Different Labeling Functions
- Collage: Light-Weight Low-Precision Strategy for LLM Training
- Collapse-Aware Triplet Decoupling for Adversarially Robust Image Retrieval
- Collective Certified Robustness against Graph Injection Attacks
- CoLoRA: Continuous low-rank adaptation for reduced implicit neural modeling of parameterized partial differential equations
- Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better
- Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
- Combining Experimental and Historical Data for Policy Evaluation
- Community-Invariant Graph Contrastive Learning
- Compact Optimality Verification for Optimization Proxies
- Comparing Graph Transformers via Positional Encodings
- CompeteAI: Understanding the Competition Dynamics of Large Language Model-based Agents
- Completing Visual Objects via Bridging Generation and Segmentation
- Complexity Matters: Feature Learning in the Presence of Spurious Correlations
- Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
- Compositional Curvature Bounds for Deep Neural Networks
- Compositional Few-Shot Class-Incremental Learning
- Compositional Image Decomposition with Diffusion Models
- Compositional Text-to-Image Generation with Dense Blob Representations
- Compress Clean Signal from Noisy Raw Image: A Self-Supervised Approach
- Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation
- Compressing Large Language Models by Joint Sparsification and Quantization
- Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth
- Compute Better Spent: Replacing Dense Layers with Structured Matrices
- Concentration Inequalities for General Functions of Heavy-Tailed Random Variables
- Conditional Common Entropy for Instrumental Variable Testing and Partial Identification
- Conditional Language Learning with Context
- Conditionally-Conjugate Gaussian Process Factor Analysis for Spike Count Data via Data Augmentation
- Conditional Normalizing Flows for Active Learning of Coarse-Grained Molecular Representations
- Confidence-aware Contrastive Learning for Selective Classification
- Confidence Aware Inverse Constrained Reinforcement Learning
- Configurable Mirror Descent: Towards a Unification of Decision Making
- Conformalized Adaptive Forecasting of Heterogeneous Trajectories
- Conformalized Survival Distributions: A Generic Post-Process to Increase Calibration
- Conformal Prediction for Deep Classifier via Label Ranking
- Conformal prediction for multi-dimensional time series by ellipsoidal sets
- Conformal Prediction Sets Improve Human Decision Making
- Conformal Predictions under Markovian Data
- Conformal Prediction with Learned Features
- Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them)
- Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases
- Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models
- Connecting the Dots: Is Mode-Connectedness the Key to Feasible Sample-Based Inference in Bayesian Neural Networks?
- Connect Later: Improving Fine-tuning for Robustness with Targeted Augmentations
- Consistent Adversarially Robust Linear Classification: Non-Parametric Setting
- Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data
- Consistent Long-Term Forecasting of Ergodic Dynamical Systems
- Consistent Submodular Maximization
- Constrained Ensemble Exploration for Unsupervised Skill Discovery
- Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin Dynamics
- Constrained Reinforcement Learning Under Model Mismatch
- Contamination-Resilient Anomaly Detection via Adversarial Learning on Partially-Observed Normal and Anomalous Data
- Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design
- ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
- Contextual Feature Selection with Conditional Stochastic Gates
- Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning
- Continuous Treatment Effects with Surrogate Outcomes
- ContPhy: Continuum Physical Concept Learning and Reasoning from Videos
- Contrasting Multiple Representations with the Multi-Marginal Matching Gap
- Contrastive Learning for Clinical Outcome Prediction with Partial Data Sources
- Contrastive Predict-and-Search for Mixed Integer Linear Programs
- Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
- Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning
- Controllable Prompt Tuning For Balancing Group Distributional Robustness
- Controlled Decoding from Language Models
- Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning
- Convergence and Complexity Guarantee for Inexact First-order Riemannian Optimization Algorithms
- Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point
- Convergence Guarantees for the DeepWalk Embedding on Block Models
- Convergence of Online Learning Algorithm for a Mixture of Multiple Linear Regressions
- Convergence of Some Convex Message Passing Algorithms to a Fixed Point
- Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption
- Convex Analysis at Infinity: An Introduction to Astral Space
- Convex and Bilevel Optimization for Neural-Symbolic Inference and Learning
- Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time
- ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy
- convSeq: Fast and Scalable Method for Detecting Patterns in Spike Data
- Cooperative Graph Neural Networks
- COPAL: Continual Pruning in Large Language Generative Models
- Coprocessor Actor Critic: A Model-Based Reinforcement Learning Approach For Adaptive Brain Stimulation
- Copula-Nested Spectral Kernel Network
- Copyright Traps for Large Language Models
- Coresets for Multiple $\ell_p$ Regression
- Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder
- Correlation-Induced Label Prior for Semi-Supervised Multi-Label Learning
- CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks
- Counterfactual Image Editing
- Counterfactual Metarules for Local and Global Recourse
- Counterfactual Reasoning for Multi-Label Image Classification via Patching-Based Training
- Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
- Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning
- C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models
- Creative Text-to-Audio Generation via Synthesizer Programming
- Criterion Collapse and Loss Distribution Control
- Critical feature learning in deep neural networks
- Critical windows: non-asymptotic theory for feature emergence in diffusion models
- CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection
- Cross-domain Open-world Discovery
- Cross-Domain Policy Adaptation by Capturing Representation Mismatch
- CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
- Cross-view Masked Diffusion Transformers for Person Image Synthesis
- CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
- Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes
- CurBench: Curriculum Learning Benchmark
- CuTS: Customizable Tabular Synthetic Data Generation
- CW Complex Hypothesis for Image Data
- DAG-Based Column Generation for Adversarial Team Games
- Data Attribution at Scale
- Data-centric Machine Learning Research (DMLR): Datasets for Foundation Models
- Data-efficient Large Vision Models through Sequential Autoregression
- Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond
- Data-Efficient Molecular Generation with Hierarchical Textual Inversion
- Data Engineering for Scaling Language Models to 128K Context
- Data-free Distillation of Diffusion Models with Bootstrapping
- Data-free Neural Representation Compression with Riemannian Neural Dynamics
- DataFreeShield: Defending Adversarial Attacks without Training Data
- Data Poisoning Attacks against Conformal Prediction
- Dealing With Unbounded Gradients in Stochastic Saddle-point Optimization
- Debating with More Persuasive LLMs Leads to More Truthful Answers
- Debiased Distribution Compression
- Debiased Offline Representation Learning for Fast Online Adaptation in Non-stationary Dynamics
- Decentralized Convex Finite-Sum Optimization with Better Dependence on Condition Numbers
- Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective
- DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning
- Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
- Decoding-time Realignment of Language Models
- Decomposable Submodular Maximization in Federated Setting
- Decomposed Linear Dynamical Systems (dLDS) for learning the latent components of neural dynamics
- Decomposing and Editing Predictions by Modeling Model Computation
- Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling
- Deconstructing the Goldilocks Zone of Neural Network Initialization
- DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection
- DE-COP: Detecting Copyrighted Content in Language Models Training Data
- Decouple then Classify: A Dynamic Multi-view Labeling Strategy with Shared and Specific Information
- Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks
- Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods
- Deep Demonstration Tracing: Learning Generalizable Imitator Policy for Runtime Imitation from a Single Demonstration
- Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures
- Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss
- Deep Functional Factor Models: Forecasting High-Dimensional Functional Time Series via Bayesian Nonparametric Factorization
- Deep Fusion: Efficient Network Training via Pre-trained Initializations
- Deep Networks Always Grok and Here is Why
- Deep Neural Room Acoustics Primitive
- DeepPolar: Inventing Nonlinear Large-Kernel Polar Codes via Deep Learning
- Deep Regression Representation Learning with Topology
- Deep Stochastic Mechanics
- Defense against Backdoor Attack on Pre-trained Language Models via Head Pruning and Attention Normalization
- Defense against Model Extraction Attack by Bayesian Active Watermarking
- Defining Neural Network Architecture through Polytope Structures of Datasets
- Degeneration-free Policy Optimization: RL Fine-Tuning for Language Models without Degeneration
- DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving
- Delaunay Graph: Addressing Over-Squashing and Over-Smoothing Using Delaunay Triangulation
- Deletion-Anticipative Data Selection with a Limited Budget
- Delving into Differentially Private Transformer
- Delving into the Convergence of Generalized Smooth Minimax Optimization
- Demystifying SGD with Doubly Stochastic Gradients
- Denoising Autoregressive Representation Learning
- Dense Reward for Free in Reinforcement Learning from Human Feedback
- Density Ratio Estimation with Doubly Strong Robustness
- Density-Softmax: Efficient Test-time Model for Uncertainty Estimation and Robustness under Distribution Shifts
- Designing Decision Support Systems using Counterfactual Prediction Sets
- Detecting and Identifying Selection Structure in Sequential Data
- Detecting Any instruction-to-answer interaction relationship:Universal Instruction-to-Answer Navigator for Med-VQA
- Detecting Influence Structures in Multi-Agent Reinforcement Learning
- DetKDS: Knowledge Distillation Search for Object Detectors
- DFA-RAG: Conversational Semantic Router for Large Language Model with Definite Finite Automaton
- DFD: Distilling the Feature Disparity Differently for Detectors
- DFlow: A Generative Model Combining Denoising AutoEncoder and Normalizing Flow for High Fidelity Waveform Generation
- D-Flow: Differentiating through Flows for Controlled Generation
- Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
- DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation
- DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation
- DiffDA: a Diffusion model for weather-scale Data Assimilation
- Differentiability and Optimization of Multiparameter Persistent Homology
- Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators
- Differentiable Annealed Importance Sampling Minimizes The Jensen-Shannon Divergence Between Initial and Target Distribution
- Differentiable Combinatorial Scheduling at Scale
- Differentiable Distributionally Robust Optimization Layers
- Differentiable Mapper for Topological Optimization of Data Representation
- Differentiable Model Scaling using Differentiable Topk
- Differentiable Weightless Neural Networks
- Differentially Private Bias-Term Fine-tuning of Foundation Models
- Differentially Private Decentralized Learning with Random Walks
- Differentially Private Domain Adaptation with Theoretical Guarantees
- Differentially private exact recovery for stochastic block models
- Differentially Private Post-Processing for Fair Regression
- Differentially Private Representation Learning via Image Captioning
- Differentially Private Sum-Product Networks
- Differentially Private Synthetic Data via Foundation Model APIs 2: Text
- Differentially Private Worst-group Risk Minimization
- DiffFPR: Diffusion Prior for Oversampled Fourier Phase Retrieval
- diff History for Neural Language Agents
- DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
- Diffuse, Sample, Project: Plug-And-Play Controllable Graph Generation
- Diffusion-based Missing-view Generation With the Application on Incomplete Multi-view Clustering
- Diffusion Language Models Are Versatile Protein Learners
- Diffusion Model-Augmented Behavioral Cloning
- Diffusion Models Demand Contrastive Guidance for Adversarial Purification to Advance
- Diffusion Models Encode the Intrinsic Dimension of Data Manifolds
- Diffusion Posterior Sampling is Computationally Intractable
- Diffusion Rejection Sampling
- Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations
- Diffusive Gibbs Sampling
- DiJiang: Efficient Large Language Models through Compact Kernelization
- DiNADO: Norm-Disentangled Neurally-Decomposed Oracles for Controlling Language Models
- DiracDiffusion: Denoising and Incremental Reconstruction with Assured Data-Consistency
- Directly Denoising Diffusion Models
- Dirichlet Flow Matching with Applications to DNA Sequence Design
- DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents
- Discounted Adaptive Online Learning: Towards Better Regularization
- Discovering Bias in Latent Space: An Unsupervised Debiasing Approach
- Discovering Environments with XRM
- Discovering Features with Synergistic Interactions in Multiple Views
- Discovering Mixtures of Structural Causal Models from Time Series Data
- Discovering Multiple Solutions from a Single Task in Offline Reinforcement Learning
- Discovering Symmetry Breaking in Physical Systems with Relaxed Group Convolution
- Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
- Discrete Latent Perspective Learning for Segmentation and Detection
- DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation
- Disentangled 3D Scene Generation with Layout Learning
- Disentangled Continual Graph Neural Architecture Search with Invariant Modular Supernet
- Disentangled Graph Self-supervised Learning for Out-of-Distribution Generalization
- Disentanglement Learning via Topology
- Disguised Copyright Infringement of Latent Diffusion Models
- Disparate Impact on Group Accuracy of Linearization for Private Inference
- Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
- Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control
- DistiLLM: Towards Streamlined Distillation for Large Language Models
- Distinguishing the Knowable from the Unknowable with Language Models
- Distributed Bilevel Optimization with Communication Compression
- Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery
- Distributional Bellman Operators over Mean Embeddings
- Distribution Alignment Optimization through Neural Collapse for Long-tailed Classification
- Distributionally Robust Data Valuation
- Distribution-Free Predictive Uncertainty Quantification: Strengths and Limits of Conformal Prediction
- DITTO: Diffusion Inference-Time T-Optimization for Music Generation
- Ditto: Quantization-aware Secure Inference of Transformers upon MPC
- Diversified Batch Selection for Training Acceleration
- Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset
- DMTG: One-Shot Differentiable Multi-Task Grouping
- DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation
- DNCs Require More Planning Steps
- Do Efficient Transformers Really Save Computation?
- Does Label Smoothing Help Deep Partial Label Learning?
- DOGE: Domain Reweighting with Generalization Estimation
- Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?
- Do Large Code Models Understand Programming Concepts? Counterfactual Analysis for Code Predicates
- Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function
- Domain Generalisation via Imprecise Learning
- Domain-wise Data Acquisition to Improve Performance under Distribution Shift
- Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
- Don't be so Negative! Score-based Generative Modeling with Oracle-assisted Guidance
- Don’t Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
- Don't trust your eyes: on the (un)reliability of feature visualizations
- DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
- DoRA: Weight-Decomposed Low-Rank Adaptation
- Do Topological Characteristics Help in Knowledge Distillation?
- Do Transformer World Models Give Better Policy Gradients?
- Double Momentum Method for Lower-Level Constrained Bilevel Optimization
- Double-Step Alternating Extragradient with Increasing Timescale Separation for Finding Local Minimax Points: Provable Improvements
- Double Stochasticity Gazes Faster: Snap-Shot Decentralized Stochastic Gradient Tracking Methods
- Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient
- Doubly Robust Causal Effect Estimation under Networked Interference via Targeted Learning
- DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems
- DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training
- DPZero: Private Fine-Tuning of Language Models without Backpropagation
- DRCT: Diffusion Reconstruction Contrastive Training towards Universal Detection of Diffusion Generated Images
- DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design
- Dr. Strategy: Model-Based Generalist Agents with Strategic Dreaming
- Drug Discovery with Dynamic Goal-aware Fragments
- DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning
- DSD-DA: Distillation-based Source Debiasing for Domain Adaptive Object Detection
- DsDm: Model-Aware Dataset Selection with Datamodels
- Dual Operating Modes of In-Context Learning
- DUPLEX: Dual GAT for Complex Embedding of Directed Graphs
- Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization
- Dynamic Byzantine-Robust Learning: Adapting to Switching Byzantine Workers
- Dynamic Correlation Clustering in Sublinear Update Time
- Dynamic Evaluation of Large Language Models by Meta Probing Agents
- Dynamic Facility Location in High Dimensional Euclidean Spaces
- Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
- Dynamic Metric Embedding into lp Space
- Dynamic Spectral Clustering with Provable Approximation Guarantee
- Dynamic Survival Analysis with Controlled Latent States
- DynSyn: Dynamical Synergistic Representation for Efficient Learning and Control in Overactuated Embodied Systems
- DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems
- E$^2$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation
- EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
- Early Time Classification with Accumulated Accuracy Gap Control
- Easing Concept Bleeding in Diffusion via Entity Localization and Anchoring
- eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data
- ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance
- EDISON: Enhanced Dictionary-Induced Tensorized Incomplete Multi-View Clustering with Gaussian Error Rank Minimization
- Editing Partially Observable Networks via Graph Diffusion Models
- EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
- Effect-Invariant Mechanisms for Policy Generalization
- Effective Federated Graph Matching
- Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing
- Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning
- Efficient Algorithms for Empirical Group Distributionally Robust Optimization and Beyond
- Efficient Algorithms for Sum-Of-Minimum Optimization
- Efficient and Effective Time-Series Forecasting with Spiking Neural Networks
- Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior
- Efficient Contextual Bandits with Uninformed Feedback Graphs
- Efficient Contrastive Learning for Fast and Accurate Inference on Graphs
- Efficient Denoising Diffusion via Probabilistic Masking
- Efficient Error Certification for Physics-Informed Neural Networks
- Efficient Exploration for LLMs
- Efficient Exploration in Average-Reward Constrained Reinforcement Learning: Achieving Near-Optimal Regret With Posterior Sampling
- Efficient Low-Rank Matrix Estimation, Experimental Design, and Arm-Set-Dependent Low-Rank Bandits
- Efficient Mixture Learning in Black-Box Variational Inference
- Efficient Non-stationary Online Learning by Wavelets with Applications to Online Distribution Shift Adaptation
- Efficient Online Set-valued Classification with Bandit Feedback
- Efficient PAC Learnability of Dynamical Systems Over Multilayer Networks
- Efficient Pareto Manifold Learning with Low-Rank Structure
- Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design
- Efficient Precision and Recall Metrics for Assessing Generative Models using Hubness-aware Sampling
- Efficient Stochastic Approximation of Minimax Excess Risk Optimization
- Efficient Value Iteration for s-rectangular Robust Markov Decision Processes
- Efficient World Models with Context-Aware Tokenization
- EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
- EiG-Search: Generating Edge-Induced Subgraphs for GNN Explanation in Linear Time
- ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
- ELTA: An Enhancer against Long-Tail for Aesthetics-oriented Models
- Eluder-based Regret for Stochastic Contextual MDPs
- Embarrassingly Parallel GFlowNets
- Embodied CoT Distillation From LLM To Off-the-shelf Agents
- EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
- Emergence of In-Context Reinforcement Learning from Noise Distillation
- Emergent Equivariance in Deep Ensembles
- Emergent Representations of Program Semantics in Language Models Trained on Programs
- Empowering Graph Invariance Learning with Deep Spurious Infomax
- Enabling Few-Shot Learning with PID Control: A Layer Adaptive Optimizer
- Enabling Uncertainty Estimation in Iterative Neural Networks
- Encodings for Prediction-based Neural Architecture Search
- End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations
- Energy-based Backdoor Defense without Task-Specific Samples and Model Retraining
- Energy-Efficient Gaussian Processes Using Low-Precision Arithmetic
- Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning
- Enforcing Constraints in RNA Secondary Structure Predictions: A Post-Processing Framework Based on the Assignment Problem
- Enhancing Adversarial Robustness in SNNs with Sparse Gradients
- Enhancing Class-Imbalanced Learning with Pre-Trained Guidance through Class-Conditional Knowledge Distillation
- Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation
- Enhancing Implicit Shape Generators Using Topological Regularizations
- Enhancing Size Generalization in Graph Neural Networks through Disentangled Representation Learning
- Enhancing Storage and Computational Efficiency in Federated Multimodal Learning for Large-Scale Models
- Enhancing Sufficient Dimension Reduction via Hellinger Correlation
- Enhancing Trajectory Prediction through Self-Supervised Waypoint Distortion Prediction
- Enhancing Value Function Estimation through First-Order State-Action Dynamics in Offline Reinforcement Learning
- Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module
- Ensemble Pruning for Out-of-distribution Generalization
- Entropy-Reinforced Planning with Large Language Models for Drug Discovery
- Environment Design for Inverse Reinforcement Learning
- Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection
- EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
- Equilibrium of Data Markets with Externality
- EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction
- Equivariance via Minimal Frame Averaging for More Symmetries and Efficiency
- Equivariant Deep Weight Space Alignment
- Equivariant Diffusion for Crystal Structure Prediction
- Equivariant Frames and the Impossibility of Continuous Canonicalization
- Equivariant Graph Neural Operator for Modeling 3D Dynamics
- Erasing the Bias: Fine-Tuning Foundation Models for Semi-Supervised Learning
- ERQ: Error Reduction for Post-Training Quantization of Vision Transformers
- Error Feedback Can Accurately Compress Preconditioners
- ES-FoMo II: 2nd Workshop on Efficient Systems for Foundation Models
- ESM All-Atom: Multi-Scale Protein Language Model for Unified Molecular Modeling
- ESNet: Evolution and Succession Network for High-Resolution Salient Object Detection
- Estimating Barycenters of Distributions with Neural Optimal Transport
- Estimating Canopy Height at Scale
- Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction
- Estimating the Permanent by Nesting Importance Sampling
- Estimating Unknown Population Sizes Using the Hypergeometric Distribution
- ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections
- Et Tu Certifications: Robustness Certificates Yield Better Adversarial Examples
- Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
- Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models
- Evaluating Instrument Validity using the Principle of Independent Mechanisms
- Evaluating Model Bias Requires Characterizing its Mistakes
- Evaluating Quantized Large Language Models
- Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks
- Evaluation of Test-Time Adaptation Under Computational Time Constraints
- Evaluation of Trajectory Distribution Predictions with Energy Score
- EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
- EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting
- EvIL: Evolution Strategies for Generalisable Imitation Learning
- EvoluNet: Advancing Dynamic Non-IID Transfer Learning on Graphs
- Evolution-Inspired Loss Functions for Protein Representation Learning
- Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model
- Evolving Subnetwork Training for Large Language Models
- EvoRainbow: Combining Improvements in Evolutionary Reinforcement Learning for Policy Search
- EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
- Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers
- Exact Soft Analytical Side-Channel Attacks using Tractable Circuits
- ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking
- Executable Code Actions Elicit Better LLM Agents
- Expand-and-Cluster: Parameter Recovery of Neural Networks
- Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning
- Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs
- Explaining Graph Neural Networks via Structure-aware Interaction Index
- Explaining Probabilistic Models with Distributional Values
- Explain Temporal Black-Box Models via Functional Decomposition
- Exploiting Code Symmetries for Learning Program Semantics
- Exploiting Human-AI Dependence for Learning to Defer
- Exploiting Negative Samples: A Catalyst for Cohort Discovery in Healthcare Analytics
- Exploration and Anti-Exploration with Distributional Random Network Distillation
- Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring
- Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
- Explorations of Self-Repair in Language Models
- Exploring Correlations of Self-Supervised Tasks for Graphs
- Exploring Intrinsic Dimension for Vision-Language Model Pruning
- Exploring the Benefit of Activation Sparsity in Pre-training
- Exploring the Complexity of Deep Neural Networks through Functional Equivalence
- Exploring the Enigma of Neural Dynamics Through A Scattering-Transform Mixer Landscape for Riemannian Manifold
- Exploring the LLM Journey from Cognition to Expression with Linear Representations
- Exploring the Low-Pass Filtering Behavior in Image Super-Resolution
- Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters
- Exponential Spectral Pursuit: An Effective Initialization Method for Sparse Phase Retrieval
- Expressivity and Generalization: Fragment-Biases for Molecular GNNs
- Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions
- Extending Test-Time Augmentation with Metamorphic Relations for Combinatorial Problems
- Extracting Training Data From Document-Based VQA Models
- Extreme Compression of Large Language Models via Additive Quantization
- Factored-Reward Bandits with Intermediate Observations
- FADAS: Towards Federated Adaptive Asynchronous Optimization
- FAFE: Immune Complex Modeling with Geodesic Distance Loss on Noisy Group Frames
- Failures Are Fated, But Can Be Faded: Characterizing and Mitigating Unwanted Behaviors in Large-Scale Vision and Language Models
- Fair Classification with Partial Feedback: An Exploration-Based Data Collection Approach
- Fair Data Representation for Machine Learning at the Pareto Frontier
- Fair Federated Learning via the Proportional Veto Core
- Fair Off-Policy Learning from Observational Data
- FairProof : Confidential and Certifiable Fairness for Neural Networks
- Fair Resource Allocation in Multi-Task Learning
- Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks
- Faithfulness Measurable Masked Language Models
- Fast Adversarial Attacks on Language Models In One GPU Minute
- Fast Algorithms for Hypergraph PageRank with Applications to Semi-Supervised Learning
- Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits
- Fast Co-Training under Weak Dependence via Stream-Based Active Learning
- Fast Decision Boundary based Out-of-Distribution Detector
- Faster Adaptive Decentralized Learning Algorithms
- Faster Maximum Inner Product Search in High Dimensions
- Faster Sampling via Stochastic Gradient Proximal Sampler
- Faster Streaming and Scalable Algorithms for Finding Directed Dense Subgraphs in Large Graphs
- Fast Peer Adaptation with Context-aware Exploration
- Fast Sampling-Based Sketches for Tensors
- Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching
- Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation
- Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
- Fast Timing-Conditioned Latent Audio Diffusion
- Fast White-Box Adversarial Streaming Without a Random Oracle
- Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training
- Feasibility Consistent Representation Learning for Safe Reinforcement Learning
- Feasible Reachable Policy Iteration
- Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation
- Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize
- Feature Distribution on Graph Topology Mediates the Effect of Graph Convolution: Homophily Perspective
- Feature Importance Disparities for Data Bias Investigations
- Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models
- FedBAT: Communication-Efficient Federated Learning via Learnable Binarization
- FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models
- FedCal: Achieving Local and Global Calibration in Federated Learning via Aggregated Parameterized Scaler
- Federated Combinatorial Multi-Agent Multi-Armed Bandits
- Federated Continual Learning via Prompt-based Dual Knowledge Transfer
- Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
- Federated Neuro-Symbolic Learning
- Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
- Federated Optimization with Doubly Regularized Drift Correction
- Federated Representation Learning in the Under-Parameterized Regime
- Federated Self-Explaining GNNs with Anti-shortcut Augmentations
- FedLMT: Tackling System Heterogeneity of Federated Learning via Low-Rank Model Training with Theoretical Guarantees
- FedMBridge: Bridgeable Multimodal Federated Learning
- FedRC: Tackling Diverse Distribution Shifts Challenge in Federated Learning by Robust Clustering
- FedREDefense: Defending against Model Poisoning Attacks for Federated Learning using Model Update Reconstruction Error
- FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data
- Feedback Efficient Online Fine-Tuning of Diffusion Models
- Feedback Loops With Language Models Drive In-Context Reward Hacking
- Feel-Good Thompson Sampling for Contextual Dueling Bandits
- FESSNC: Fast Exponentially Stable and Safe Neural Controller
- Fewer Truncations Improve Language Modeling
- Few-shot Adaptation to Distribution Shifts By Mixing Source and Target Embeddings
- Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind
- Few-Shot Unsupervised Implicit Neural Shape Representation Learning with Spatial Adversaries
- FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning
- Finding NEM-U: Explaining unsupervised representation learning through neural network generated explanation masks
- Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement Learning
- Fine-grained Classes and How to Find Them
- Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention
- Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem
- Finite Smoothing Algorithm for High-Dimensional Support Vector Machines and Quantile Regression
- Finite-Time Convergence and Sample Complexity of Actor-Critic Multi-Objective Reinforcement Learning
- Finite Time Logarithmic Regret Bounds for Self-Tuning Regulation
- Finite Volume Features, Global Geometry Representations, and Residual Training for Deep Learning-based CFD Simulation
- First-Order Manifold Data Augmentation for Regression Learning
- FiT: Flexible Vision Transformer for Diffusion Model
- FlashST: A Simple and Universal Prompt-Tuning Framework for Traffic Prediction
- Flexible Residual Binarization for Image Super-Resolution
- Flextron: Many-in-One Flexible Large Language Model
- Floating Anchor Diffusion Model for Multi-motif Scaffolding
- Flora: Low-Rank Adapters Are Secretly Gradient Compressors
- FlowMM: Generating Materials with Riemannian Flow Matching
- Fool Your (Vision and) Language Model with Embarrassingly Simple Permutations
- Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics
- Foundation Policies with Hilbert Representations
- Foundations of Data-efficient Machine Learning
- Foundations of Reinforcement Learning and Control: Connections and Perspectives
- Foundations of Testing for Finite-Sample Causal Discovery
- Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning
- FRAG: Frequency Adapting Group for Diffusion Video Editing
- FrameQuant: Flexible Low-Bit Quantization for Transformers
- FRAPPÉ: A Group Fairness Framework for Post-Processing Everything
- FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
- From Biased Selective Labels to Pseudo-Labels: An Expectation-Maximization Framework for Learning from Biased Decisions
- From Classification Accuracy to Proper Scoring Rules: Elicitability of Probabilistic Top List Predictions
- From Coarse to Fine: Enable Comprehensive Graph Self-supervised Learning with Multi-granular Semantic Ensemble
- From Fourier to Neural ODEs: Flow Matching for Modeling Complex Systems
- From Generalization Analysis to Optimization Designs for State Space Models
- From Geometry to Causality- Ricci Curvature and the Reliability of Causal Inference on Networks
- From Inverse Optimization to Feasibility to ERM
- From Neurons to Neutrons: A Case Study in Interpretability
- From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
- From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
- From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems
- From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning
- Full-Atom Peptide Design based on Multi-modal Flow Matching
- Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees
- Fundamental Benefit of Alternating Updates in Minimax Optimization
- Fundamental Limitations of Alignment in Large Language Models
- Fundamental Limits of Distributed Covariance Matrix Estimation Under Communication Constraints
- FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning
- GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
- Gambling-Based Confidence Sequences for Bounded Random Vectors
- Gated Linear Attention Transformers with Hardware-Efficient Training
- GATE: How to Keep Out Intrusive Neighbors
- Gaussian Plane-Wave Neural Operator for Electron Density Estimation
- GaussianPro: 3D Gaussian Splatting with Progressive Propagation
- Gaussian Processes on Cellular Complexes
- GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
- GenCO: Generating Diverse Designs with Combinatorial Constraints
- Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning
- Generalization Analysis for Multi-Label Learning
- Generalization Analysis of Deep Non-linear Matrix Completion
- Generalization Analysis of Stochastic Weight Averaging with General Sampling
- Generalization Bound and New Algorithm for Clean-Label Backdoor Attack
- Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity Analysis
- Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation
- Generalization Error of Graph Neural Networks in the Mean-field Regime
- Generalization in Kernel Regression Under Realistic Assumptions
- Generalization to New Sequential Decision Making Tasks with In-Context Learning
- Generalized Neural Collapse for a Large Number of Classes
- Generalized Preference Optimization: A Unified Approach to Offline Alignment
- Generalized Smooth Variational Inequalities: Methods with Adaptive Stepsizes
- Generalized Sobolev Transport for Probability Measures on a Graph
- Generalizing Knowledge Graph Embedding with Universal Orthogonal Parameterization
- Generalizing Orthogonalization for Models with Non-Linearities
- Generating Chain-of-Thoughts with a Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought
- Generating In-Distribution Proxy Graphs for Explaining Graph Neural Networks
- Generative Active Learning for Long-tailed Instance Segmentation
- Generative Conditional Distributions by Neural (Entropic) Optimal Transport
- Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates
- Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design
- Generative Marginalization Models
- Generative Modeling on Manifolds Through Mixture of Riemannian Diffusion Processes
- Genie: Generative Interactive Environments
- GeoAB: Towards Realistic Antibody Design and Reliable Affinity Maturation
- Geometric Active Exploration in Markov Decision Processes: the Benefit of Abstraction
- Geometry-Aware Instrumental Variable Regression
- Geometry-Calibrated DRO: Combating Over-Pessimism with Free Energy Implications
- Geometry-grounded Representation Learning and Generative Modeling
- GeoMFormer: A General Architecture for Geometric Molecular Representation Learning
- GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
- Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
- Getting the most out of your tokenizer for pre-training and domain adaptation
- GFlowNet Training by Policy Gradients
- Gibbs Sampling of Continuous Potentials on a Quantum Computer
- GiLOT: Interpreting Generative Language Models via Optimal Transport
- GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks
- GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding
- Global Reinforcement Learning : Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods
- GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
- GNNs Also Deserve Editing, and They Need It More Than Once
- Going beyond Compositions, DDPMs Can Produce Zero-Shot Interpolations
- Gondzo - Charting a Path for African Low-Resource Languages: A Multifaceted Approach to Research and Development
- GPT-4V(ision) is a Generalist Web Agent, if Grounded
- GPTSwarm: Language Agents as Optimizable Graphs
- Gradient-based Visual Explanation for Transformer-based CLIP
- Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization
- Gradual Divergence for Seamless Adaptation: A Novel Domain Incremental Learning Method
- Graph2Tac: Online Representation Learning of Formal Math Concepts
- Graph Adversarial Diffusion Convolution
- Graph As Point Set
- Graph Attention Retrospective
- Graph Automorphism Group Equivariant Neural Networks
- Graph-based Forecasting with Missing Data through Spatiotemporal Downsampling
- Graph-based Time Series Clustering for End-to-End Hierarchical Forecasting
- Graph Distillation with Eigenbasis Matching
- Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
- Graph External Attention Enhanced Transformer
- Graph Generation with Diffusion Mixture
- Graph Geometry-Preserving Autoencoders
- Graph Learning: Principles, Challenges, and Open Directions
- Graph Mixup on Approximate Gromov–Wasserstein Geodesics
- Graph Neural Network Explanations are Fragile
- Graph Neural Networks Use Graphs When They Shouldn't
- Graph Neural Networks with a Distribution of Parametrized Graphs
- Graph Neural PDE Solvers with Conservation and Similarity-Equivariance
- Graph Neural Stochastic Diffusion for Estimating Uncertainty in Node Classification
- Graphon Mean Field Games with a Representative Player: Analysis and Learning Algorithm
- Graph Out-of-Distribution Detection Goes Neighborhood Shaping
- Graph Positional and Structural Encoder
- Graph Structure Extrapolation for Out-of-Distribution Generalization
- Graph-Triggered Rising Bandits
- GRATH: Gradual Self-Truthifying for Large Language Models
- Grokking Group Multiplication with Cosets
- GroupCover: A Secure, Efficient and Scalable Inference Framework for On-device Model Protection based on TEEs
- Guarantees for Nonlinear Representation Learning: Non-identical Covariates, Dependent Data, Fewer Samples
- Guidance with Spherical Gaussian Constraint for Conditional Diffusion
- Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation
- HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
- HAMLET: Graph Transformer Neural Operator for Partial Differential Equations
- Handling Heterogeneous Curvatures in Bandit LQR Control
- Hard Tasks First: Multi-Task Reinforcement Learning Through Task Scheduling
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
- HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning
- Harmonic Self-Conditioned Flow Matching for joint Multi-Ligand Docking and Binding Site Design
- Harmonizing Generalization and Personalization in Federated Prompt Learning
- HarmonyDream: Task Harmonization Inside World Models
- Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis
- Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition
- Harnessing Neural Unit Dynamics for Effective and Scalable Class-Incremental Learning
- Harnessing the Power of Neural Operators with Automatically Encoded Conservation Laws
- HelmFluid: Learning Helmholtz Dynamics for Interpretable Fluid Prediction
- Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions
- HexGen: Generative Inference of Large Language Model over Heterogeneous Environment
- HGAP: Boosting Permutation Invariant and Permutation Equivariant in Multi-Agent Reinforcement Learning via Graph Attention Network
- HGCN2SP: Hierarchical Graph Convolutional Network for Two-Stage Stochastic Programming
- Hidden Traveling Waves bind Working Memory Variables in Recurrent Neural Networks
- Hierarchical Integral Probability Metrics: A distance on random probability measures with low sample complexity
- Hierarchical Neural Operator Transformer with Learnable Frequency-aware Loss Prior for Arbitrary-scale Super-resolution
- Hierarchical Novelty Detection via Fine-Grained Evidence Allocation
- Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling
- Hieros: Hierarchical Imagination on Structured State Space Sequence World Models
- High-Dimensional Bayesian Optimization via Semi-Supervised Learning with Optimized Unlabeled Data Sampling
- High-Dimensional Geometric Streaming for Nearly Low Rank Data
- High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization
- High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning
- High-dimensional Linear Bandits with Knapsacks
- High-Order Contrastive Learning with Fine-grained Comparative Levels for Sparse Ordinal Tensor Completion
- High-Performance Temporal Reversible Spiking Neural Networks with $\mathcal{O}(L)$ Training Memory and $\mathcal{O}(1)$ Inference Cost
- High-Probability Bound for Non-Smooth Non-Convex Stochastic Optimization with Heavy Tails
- High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise
- Highway Value Iteration Networks
- Homomorphism Counts for Graph Neural Networks: All About That Basis
- How Deep Do We Need: Accelerating Training and Inference of Neural ODEs via Control Perspective
- How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model
- How Does Goal Relabeling Improve Sample Efficiency?
- How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?
- How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?
- How do Transformers Perform In-Context Autoregressive Learning ?
- How Far Can Fairness Constraints Help Recover From Biased Data?
- How Flawed Is ECE? An Analysis via Logit Smoothing
- How Free is Parameter-Free Stochastic Optimization?
- How Graph Neural Networks Learn: Lessons from Training Dynamics
- How Interpretable Are Interpretable Graph Neural Networks?
- How Language Model Hallucinations Can Snowball
- How Learning by Reconstruction Produces Uninformative Features For Perception
- How Private are DP-SGD Implementations?
- How Smooth Is Attention?
- How Spurious Features are Memorized: Precise Analysis for Random and NTK Features
- How to Escape Sharp Minima with Random Perturbations
- How to Explore with Belief: State Entropy Maximization in POMDPs
- How to Leverage Diverse Demonstrations in Offline Imitation Learning
- How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization
- How to Trace Latent Generative Model Generated Images without Artificial Watermark?
- How Transformers Learn Causal Structure with Gradient Descent
- How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers
- How Universal Polynomial Bases Enhance Spectral Graph Neural Networks: Heterophily, Over-smoothing, and Over-squashing
- How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis
- Human Alignment of Large Language Models through Online Preference Optimisation
- Human-like Category Learning by Injecting Ecological Priors from Large Language Models into Neural Networks
- Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact
- HumanTOMATO: Text-aligned Whole-body Motion Generation
- Human vs. Generative AI in Content Creation Competition: Symbiosis or Conflict?
- Hybrid$^2$ Neural ODE Causal Modeling and an Application to Glycemic Response
- Hybrid Inverse Reinforcement Learning
- Hybrid Neural Representations for Spherical Data
- Hybrid Reinforcement Learning from Offline Observation Alone
- Hyperbolic Active Learning for Semantic Segmentation under Domain Shift
- Hyperbolic Geometric Latent Diffusion Model for Graph Generation
- Hyperbolic Optimizer as a Dynamical System
- HyperFields: Towards Zero-Shot Generation of NeRFs from Text
- Hypergraph-enhanced Dual Semi-supervised Graph Classification
- IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency
- ICML 2024 Workshop on Foundation Models in the Wild
- ICML Workshop on Large Language Models and Cognition
- Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank
- Identification and Estimation for Nonignorable Missing Data: A Data Fusion Approach
- IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation
- ILILT: Implicit Learning of Inverse Lithography Technologies
- IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation
- Image Clustering with External Guidance
- Image Fusion via Vision-Language Model
- Image Hijacks: Adversarial Images can Control Generative Models at Runtime
- Image Restoration Through Generalized Ornstein-Uhlenbeck Bridge
- Imitation Learning from Purified Demonstrations
- Imitation Learning in Discounted Linear MDPs without exploration assumptions
- Impact of Decentralized Learning on Player Utilities in Stackelberg Games
- Implicit Bias of AdamW: $\ell_\infty$-Norm Constrained Optimization
- Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States
- Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD
- Implicit meta-learning may lead language models to trust more reliable sources
- Implicit Regularization in Feedback Alignment Learning Mechanisms for Neural Networks
- Implicit Representations for Constrained Image Segmentation
- Implicit Representations via Operator Learning
- Improved Bounds for Pure Private Agnostic Learning: Item-Level and User-Level Privacy
- Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy
- Improved Differentially Private and Lazy Online Convex Optimization: Lower Regret without Smoothness Requirements
- Improved Dimensionality Dependence for Zeroth-Order Optimisation over Cross-Polytopes
- Improved Generalization of Weight Space Networks via Augmentations
- Improved Modelling of Federated Datasets using Mixtures-of-Dirichlet-Multinomials
- Improved Operator Learning by Orthogonal Attention
- Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm
- Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training
- Improving Adversarial Energy-Based Model via Diffusion Process
- Improving Antibody Humanness Prediction using Patent Data
- Improving Computational Complexity in Statistical Models with Local Curvature Information
- Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning
- Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance
- Improving Equivariant Graph Neural Networks on Large Geometric Graphs via Virtual Nodes Learning
- Improving Factuality and Reasoning in Language Models through Multiagent Debate
- Improving fine-grained understanding in image-text pre-training
- Improving Generalization in Offline Reinforcement Learning via Adversarial Data Splitting
- Improving Gradient-Guided Nested Sampling for Posterior Inference
- Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference
- Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
- Improving Interpretation Faithfulness for Vision Transformers
- Improving Neural Additive Models with Bayesian Principles
- Improving Neural Logic Machines via Failure Reflection
- Improving Open-Ended Text Generation via Adaptive Decoding
- Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining
- Improving Robustness to Multiple Spurious Correlations by Multi-Objective Optimization
- Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games
- Improving SAM Requires Rethinking its Optimization Formulation
- Improving Sharpness-Aware Minimization by Lookahead
- Improving Token-Based World Models with Parallel Observation Prediction
- Improving Transformers with Dynamically Composable Multi-Head Attention
- IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers
- Incentivized Learning in Principal-Agent Bandit Games
- In-context Convergence of Transformers
- In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought
- In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization
- In-Context Language Learning: Architectures and Algorithms
- In-Context Learning Agents Are Asymmetric Belief Updaters
- In-context Learning on Function Classes Unveiled for Transformers
- In-Context Principle Learning from Mistakes
- In-Context Reinforcement Learning for Variable Action Spaces
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
- In-Context Unlearning: Language Models as Few-Shot Unlearners
- In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
- Incorporating Information into Shapley Values: Reweighting via a Maximum Entropy Approach
- Incorporating probabilistic domain knowledge into deep multiple instance learning
- Incremental Topological Ordering and Cycle Detection with Predictions
- Indirectly Parameterized Concrete Autoencoders
- Individual Contributions as Intrinsic Exploration Scaffolds for Multi-agent Reinforcement Learning
- Individual Fairness in Graph Decomposition
- Individualized Privacy Accounting via Subsampling with Applications in Combinatorial Optimization
- Inexact Newton-type Methods for Optimisation with Nonnegativity Constraints
- InferCept: Efficient Intercept Support for Augmented Large Language Model Inference
- Inferring Change Points in High-Dimensional Linear Regression via Approximate Message Passing
- Inferring Dynamic Networks from Marginals with Iterative Proportional Fitting
- Inferring the Long-Term Causal Effects of Long-Term Treatments from Short-Term Experiments
- InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks
- Infinite-Horizon Distributionally Robust Regret-Optimal Control
- InfoNet: Neural Estimation of Mutual Information without Test-Time Optimization
- Information Complexity of Stochastic Convex Optimization: Applications to Generalization, Memorization, and Tracing
- Information-Directed Pessimism for Offline Reinforcement Learning
- Information Flow in Self-Supervised Learning
- Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks
- Initial Guessing Bias: How Untrained Networks Favor Some Classes
- Instruction Tuning for Secure Code Generation
- InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
- InstructSpeech: Following Speech Editing Instructions via Large Language Models
- InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models
- Integrated Hardware Architecture and Device Placement Search
- Integrating Global Context Contrast and Local Sensitivity for Blind Image Quality Assessment
- Integrating Multimodal Data for Joint Generative Modeling of Complex Dynamics
- Interacting Diffusion Processes for Event Sequence Forecasting
- Interaction-based Retrieval-augmented Diffusion Models for Protein-specific 3D Molecule Generation
- InterLUDE: Interactions between Labeled and Unlabeled Data to Enhance Semi-Supervised Learning
- Interplay of ROC and Precision-Recall AUCs: Theoretical Limits and Practical Implications in Binary Classification
- Interpretability Illusions in the Generalization of Simplified Models
- Interpretable Deep Clustering for Tabular Data
- InterpreTabNet: Distilling Predictive Signals from Tabular Data by Salient Feature Interpretation
- Interpreting and Improving Diffusion Models from an Optimization Perspective
- Interpreting and Improving Large Language Models in Arithmetic Calculation
- Interpreting Equivariant Representations
- Intersecting-Boundary-Sensitive Fingerprinting for Tampering Detection of DNN Models
- Intersectional Unfairness Discovery
- In value-based deep reinforcement learning, a pruned network is a good network
- Invariant Risk Minimization Is A Total Variation Model
- Inverse-Variance Weighting for Estimation of Heterogeneous Treatment Effects
- Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning
- INViT: A Generalizable Routing Problem Solver with Invariant Nested View Transformer
- I/O Complexity of Attention, or How Optimal is FlashAttention?
- IOI: Invisible One-Iteration Adversarial Attack on No-Reference Image- and Video-Quality Metrics
- Irregular Multivariate Time Series Forecasting: A Transformable Patching Graph Neural Networks Approach
- Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
- Is Epistemic Uncertainty Faithfully Represented by Evidential Deep Learning Methods?
- Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective
- Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective
- Is Kernel Prediction More Powerful than Gating in Convolutional Neural Networks?
- Isometric Representation Learning for Disentangled Latent Space of Diffusion Models
- Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
- Iterated Denoising Energy Matching for Sampling from Boltzmann Densities
- Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
- Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
- Iterative Regularized Policy Optimization with Imperfect Demonstrations
- Iterative Search Attribution for Deep Neural Networks
- IW-GAE: Importance weighted group accuracy estimation for improved calibration and model selection in unsupervised domain adaptation
- Jacobian Regularizer-based Neural Granger Causality
- Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
- Joint Composite Latent Space Bayesian Optimization
- Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs
- Just Cluster It: An Approach for Exploration in High-Dimensions using Clustering and Pre-Trained Representations
- Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows
- Kepler codebook
- Kernel-Based Evaluation of Conditional Biological Sequence Models
- Kernel Debiased Plug-in Estimation: Simultaneous, Automated Debiasing without Influence Functions for Many Target Parameters
- Kernel Semi-Implicit Variational Inference
- KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions
- KernelWarehouse: Rethinking the Design of Dynamic Convolution
- Keypoint-based Progressive Chain-of-Thought Distillation for LLMs
- KISA: A Unified Keyframe Identifier and Skill Annotator for Long-Horizon Robotics Demonstrations
- KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
- KnowFormer: Revisiting Transformers for Knowledge Graph Reasoning
- Knowledge-aware Reinforced Language Models for Protein Directed Evolution
- Knowledge Distillation with Auxiliary Variable
- Knowledge Graphs Can be Learned with Just Intersection Features
- Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models
- KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
- LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning
- LaMAGIC: Language-Model-based Topology Generation for Analog Integrated Circuits
- LangCell: Language-Cell Pre-training for Cell Identity Understanding
- Langevin Policy for Safe Reinforcement Learning
- Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
- Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models
- Language-Driven Cross-Modal Classifier for Zero-Shot Multi-Label Image Recognition
- Language Generation with Strictly Proper Scoring Rules
- Language-guided Skill Learning with Temporal Variational Inference
- Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
- Language Models as Science Tutors
- Language Models as Semantic Indexers
- Language Models Represent Beliefs of Self and Others
- Language Models with Conformal Factuality Guarantees
- Large Language Models are Geographically Biased
- Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning
- Large Scale Dataset Distillation with Domain Shift
- Larimar: Large Language Models with Episodic Memory Control
- LASER: Linear Compression in Wireless Distributed Optimization
- Latent Logic Tree Extraction for Event Sequence Explanation from LLMs
- Latent Noise Segmentation: How Neural Noise Leads to the Emergence of Segmentation and Grouping
- Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
- Latent Space Symmetry Discovery
- Latent variable model for high-dimensional point process with structured missingness
- Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency
- LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging
- Layerwise Change of Knowledge in Neural Networks
- Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning
- LCA-on-the-Line: Benchmarking Out of Distribution Generalization with Class Taxonomies
- LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions
- Learning 1-Bit Tiny Object Detector with Discriminative Feature Refinement
- Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking
- Learning a Diffusion Model Policy from Rewards via Q-Score Matching
- Learning and Forgetting Unsafe Examples in Large Language Models
- Learning Associative Memories with Gradient Descent
- Learning Causal Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition
- Learning Causal Dynamics Models in Object-Oriented Environments
- Learning Causal Relations from Subsampled Time Series with Two Time-Slices
- Learning Cognitive Maps from Transformer Representations for Efficient Planning in Partially Observed Environments
- Learning Constraints from Offline Demonstrations via Superior Distribution Correction Estimation
- Learning Coverage Paths in Unknown Environments with Deep Reinforcement Learning
- Learning Decision Policies with Instrumental Variables through Double Machine Learning
- Learning Decision Trees and Forests with Algorithmic Recourse
- Learning Divergence Fields for Shift-Robust Graph Representations
- Learning-Efficient Yet Generalizable Collaborative Filtering for Item Recommendation
- Learning Exceptional Subgroups by End-to-End Maximizing KL-Divergence
- Learning from Integral Losses in Physics Informed Neural Networks
- Learning from Memory: Non-Parametric Memory Augmented Self-Supervised Learning of Visual Features
- Learning from Streaming Data when Users Choose
- Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
- Learning Graph Representation via Graph Entropy Maximization
- Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding
- Learning High-Order Relationships of Brain Regions
- Learning in Deep Factor Graphs with Gaussian Belief Propagation
- Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nyström method
- Learning Iterative Reasoning through Energy Diffusion
- Learning Label Shift Correction for Test-Agnostic Long-Tailed Recognition
- Learning Latent Dynamic Robust Representations for World Models
- Learning Latent Space Hierarchical EBM Diffusion Models
- Learning Latent Structures in Network Games via Data-Dependent Gated-Prior Graph Variational Autoencoders
- Learning Linear Block Error Correction Codes
- Learning Low-dimensional Latent Dynamics from High-dimensional Observations: Non-asymptotics and Lower Bounds
- Learning Mixtures of Gaussian Processes through Random Projection
- Learning Modality Knowledge Alignment for Cross-Modality Transfer
- Learning Multiple Secrets in Mastermind
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients
- Learning Optimal Projection for Forecast Reconciliation of Hierarchical Time Series
- Learning Pseudo-Contractive Denoisers for Inverse Problems
- Learning-Rate-Free Stochastic Optimization over Riemannian Manifolds
- Learning Reward for Robot Skills Using Large Language Models via Self-Alignment
- Learning Scale-Aware Spatio-temporal Implicit Representation for Event-based Motion Deblurring
- Learning Shadow Variable Representation for Treatment Effect Estimation under Collider Bias
- Learning Solution-Aware Transformers for Efficiently Solving Quadratic Assignment Problem
- Learning Surrogates for Offline Black-Box Optimization via Gradient Matching
- Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making
- Learning the Target Network in Function Space
- Learning the Uncertainty Sets of Linear Control Systems via Set Membership: A Non-asymptotic Analysis
- Learning to Compile Programs to Neural Networks
- Learning to Continually Learn with the Bayesian Principle
- Learning to Explore for Stochastic Gradient MCMC
- Learning to Explore in POMDPs with Informational Rewards
- Learning to Infer Generative Template Programs for Visual Concepts
- Learning to Intervene on Concept Bottlenecks
- Learning to Model the World With Language
- Learning to Play Atari in a World of Tokens
- Learning to Predict Mutational Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning
- Learning to Reach Goals via Diffusion
- Learning to Remove Cuts in Integer Linear Programming
- Learning to Route Among Specialized Experts for Zero-Shot Generalization
- Learning to Scale Logits for Temperature-Conditional GFlowNets
- Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces
- Learning Universal Predictors
- Learning Useful Representations of Recurrent Neural Network Weight Matrices
- Learning with 3D rotations, a hitchhiker's guide to SO(3)
- Learning with Adaptive Resource Allocation
- Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical
- Learning with Partial-Label and Unlabeled Data: A Uniform Treatment for Supervision Redundancy and Insufficiency
- Less is More: on the Over-Globalizing Problem in Graph Transformers
- Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!
- LESS: Selecting Influential Data for Targeted Instruction Tuning
- Let Go of Your Labels with Unsupervised Transfer
- Leverage Class-Specific Accuracy to Guide Data Generation for Improving Image Classification
- Leveraging Attractor Dynamics in Spatial Navigation for Better Language Parsing
- Leveraging (Biased) Information: Multi-armed Bandits with Offline Data
- Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference
- Leveraging VLM-Based Pipelines to Annotate 3D Objects
- LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views
- Libra: Building Decoupled Vision System on Large Language Models
- LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models
- Lie Neurons: Adjoint-Equivariant Neural Networks for Semisimple Lie Algebras
- Light and Optimal Schrödinger Bridge Matching
- Lightweight Image Super-Resolution via Flexible Meta Pruning
- Limited Preference Aided Imitation Learning from Imperfect Demonstrations
- Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
- Linear Explanations for Individual Neurons
- Linguistic Calibration of Long-Form Generations
- Liouville Flow Importance Sampler
- Listenable Maps for Audio Classifiers
- Listening to the noise: Blind Denoising with Gibbs Diffusion
- Listwise Reward Estimation for Offline Preference-based Reinforcement Learning
- LLaGA: Large Language and Graph Assistant
- LLark: A Multimodal Instruction-Following Language Model for Music
- LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
- LLM-Empowered State Representation for Reinforcement Learning
- LLM Maybe LongLM: SelfExtend LLM Context Window Without Tuning
- Local Causal Structure Learning in the Presence of Latent Variables
- Local Feature Selection without Label or Feature Leakage for Interpretable Machine Learning Predictions
- Locality-Sensitive Hashing-Based Efficient Point Transformer with Applications in High-Energy Physics
- Localizing Task Information for Improved Model Merging and Compression
- Locally Differentially Private Decentralized Stochastic Bilevel Optimization with Guaranteed Convergence Accuracy
- Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization
- Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies
- Local vs. Global Interpretability: A Computational Complexity Perspective
- LoCoCo: Dropping In Convolutions for Long Context Compression
- Logistic Variational Bayes Revisited
- Log Neural Controlled Differential Equations: The Lie Brackets Make A Difference
- Long-Context Foundation Models
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
- Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer
- Long Range Propagation on Continuous-Time Dynamic Graphs
- LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
- Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts
- Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining
- Lookbehind-SAM: k steps back, 1 step forward
- LoRA+: Efficient Low Rank Adaptation of Large Models
- LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models
- LoRA Training in the NTK Regime has No Spurious Local Minima
- Loss Shaping Constraints for Long-Term Time Series Forecasting
- Low-Cost High-Power Membership Inference Attacks
- Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery
- Low-Rank Similarity Mining for Multimodal Dataset Distillation
- LPGD: A General Framework for Backpropagation through Embedded Optimization Layers
- LQER: Low-Rank Quantization Error Reconstruction for LLMs
- LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering
- Lucilla Sioli
- Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation
- Machine Learning for Earth System Modeling: Accelerating Pathways to Impact
- Machine Learning Opportunities for the Next Generation of Particle Physics
- Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning
- MADA: Meta-Adaptive Optimizers Through Hyper-Gradient Descent
- Maestro: Uncovering Low-Rank Structures via Trainable Decomposition
- MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models
- MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
- Magicoder: Empowering Code Generation with OSS-Instruct
- MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
- MAGNOLIA: Matching Algorithms via GNNs for Online Value-to-go Approximation
- Major-Minor Mean Field Multi-Agent Reinforcement Learning
- Make-A-Shape: a Ten-Million-scale 3D Shape Model
- Making Old Things New: A Unified Algorithm for Differentially Private Clustering
- MALIBO: Meta-learning for Likelihood-free Bayesian Optimization
- Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution
- Mapping the Multiverse of Latent Representations
- Masked Face Recognition with Generative-to-Discriminative Representations
- MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective
- Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
- Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
- Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games
- Mathematical Framework for Online Social Media Auditing
- MathScale: Scaling Instruction Tuning for Mathematical Reasoning
- Matrix Information Theory for Self-Supervised Learning
- Matroid Semi-Bandits in Sublinear Time
- MaxMin-RLHF: Alignment with Diverse Human Preferences
- MC-GTA: Metric-Constrained Model-Based Clustering using Goodness-of-fit Tests with Autocorrelations
- MD tree: a model-diagnostic tree grown on loss landscape
- Mean Estimation in the Add-Remove Model of Differential Privacy
- Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective
- Mean-field Chaos Diffusion Models
- Mean Field Langevin Actor-Critic: Faster Convergence and Global Optimality beyond Lazy Learning
- Mean-field Underdamped Langevin Dynamics and its Spacetime Discretization
- Measures of diversity and space-filling designs for categorical data
- Measuring Stochastic Data Complexity with Boltzmann Influence Functions
- Mechanistic Design and Scaling of Hybrid Architectures
- Mechanistic Neural Networks for Scientific Machine Learning
- Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
- Membership Inference Attacks on Diffusion Models via Quantile Regression
- Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory Architecture
- Memorization Through the Lens of Curvature of Loss Function Around Samples
- Memory Consolidation Enables Long-Context Video Understanding
- Memory Efficient Neural Processes via Constant Memory Attention Block
- MEMORYLLM: Towards Self-Updatable Large Language Models
- Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
- Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
- Meta Evidential Transformer for Few-Shot Open-Set Recognition
- Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments
- Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning
- MF-CLR: Multi-Frequency Contrastive Learning Representation for Time Series
- MFTN: A Multi-scale Feature Transfer Network Based on IMatchFormer for Hyperspectral Image Super-Resolution
- MGit: A Model Versioning and Management System
- MH-pFLID: Model Heterogeneous personalized Federated Learning via Injection and Distillation for Medical Data Analysis
- MILP-FBGen: LP/MILP Instance Generation with Feasibility/Boundedness
- Mimicking Better by Matching the Approximate Action Distribution
- MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
- Mind the Boundary: Coreset Selection via Reconstructing the Decision Boundary
- Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value
- Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions
- Minimizing $f$-Divergences by Interpolating Velocity Fields
- Minimum Norm Interpolation Meets The Local Theory of Banach Spaces
- Minimum-Norm Interpolation Under Covariate Shift
- Mitigating Catastrophic Forgetting in Online Continual Learning by Modeling Previous Task Interrelations via Pareto Optimization
- Mitigating Label Noise on Graphs via Topological Sample Selection
- Mitigating Oversmoothing Through Reverse Process of GNNs for Heterophilic Graphs
- Mitigating Privacy Risk in Membership Inference by Convex-Concave Loss
- Mixtures of Experts Unlock Parameter Scaling for Deep RL
- MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation
- ML for Life and Material Science: From Theory to Industry Applications
- MLI Formula: A Nearly Scale-Invariant Solution with Noise Perturbation
- MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization
- MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
- MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
- MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
- Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers
- MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
- Model Alignment as Prospect Theoretic Optimization
- Model Assessment and Selection under Temporal Distribution Shift
- Model-Based Minimum Bayes Risk Decoding for Text Generation
- Model-based Reinforcement Learning for Confounded POMDPs
- Model-based Reinforcement Learning for Parameterized Action Spaces
- Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL
- Model-Free Robust $\phi$-Divergence Reinforcement Learning Using Both Offline and Online Data
- Modeling Caption Diversity in Contrastive Vision-Language Pretraining
- Modeling Language Tokens as Functionals of Semantic Fields
- Modelling Microbial Communities with Graph Neural Networks
- Models of Human Feedback for AI Alignment
- Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models
- Modular Learning of Deep Causal Generative Models for High-dimensional Causal Inference
- MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence
- Mol-AE: Auto-Encoder Based Molecular Representation Learning With 3D Cloze Test Objective
- MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space
- Mollification Effects of Policy Gradient Methods
- MOMENT: A Family of Open Time-series Foundation Models
- Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
- Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments
- Momentum Particle Maximum Likelihood
- MoMo: Momentum Models for Adaptive Learning Rates
- Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
- Monotone, Bi-Lipschitz, and Polyak-Łojasiewicz Networks
- Monotone Individual Fairness
- Moreau Envelope for Nonconvex Bi-Level Optimization: A Single-Loop and Hessian-Free Solution Strategy
- More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning
- More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms
- MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation
- MS$^3$D: A RG Flow-Based Regularization for GAN Training with Limited Data
- MS-TIP: Imputation Aware Pedestrian Trajectory Prediction
- Multi-Agent Reinforcement Learning Meets Leaf Sequencing in Radiotherapy
- Multi-Agent Reinforcement Learning with Hierarchical Coordination for Emergency Responder Stationing
- Multicalibration for Confidence Scoring in LLMs
- Multi-class Probabilistic Bounds for Majority Vote Classifiers with Partially Labeled Data
- Multi-Factor Adaptive Vision Selection for Egocentric Video Question Answering
- Multi-Fidelity Residual Neural Processes for Scalable Surrogate Modeling
- Multi-group Learning for Hierarchical Groups
- Multigroup Robustness
- Multi-layer Rehearsal Feature Augmentation for Class-Incremental Learning
- MultiMax: Sparse and Multi-Modal Attention Learning
- Multi-modal Foundation Model meets Embodied AI (MFM-EAI)
- Multimodal Prototyping for cancer survival prediction
- Multi-Patch Prediction: Adapting Language Models for Time Series Representation Learning
- Multiplicative Weights Update, Area Convexity and Random Coordinate Descent for Densest Subgraph Problems
- Multiply-Robust Causal Change Attribution
- Multiply Robust Estimation for Local Distribution Shifts with Multiple Domains
- Multi-Region Markovian Gaussian Process: An Efficient Method to Discover Directional Communications Across Multiple Brain Regions
- Multi-Sender Persuasion: A Computational Perspective
- Multi-Source Conformal Inference Under Distribution Shift
- Multi-Track Message Passing: Tackling Oversmoothing and Oversquashing in Graph Learning via Preventing Heterophily Mixing
- Multi-View Clustering by Inter-cluster Connectivity Guided Reward
- Multi-View Stochastic Block Models
- MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
- MusicRL: Aligning Music Generation to Human Preferences
- MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving
- MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts
- Naive Bayes Classifiers over Missing Data: Decision and Poisoning
- Nash Incentive-compatible Online Mechanism Learning via Weakly Differentially Private Online Learning
- Nash Learning from Human Feedback
- NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
- Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching
- Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
- NDOT: Neuronal Dynamics-based Online Training for Spiking Neural Networks
- Nearest Neighbour Score Estimators for Diffusion Generative Models
- Near-Linear Time Approximation Algorithms for k-means with Outliers
- Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
- Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints
- Neighboring Perturbations of Knowledge Editing on Large Language Models
- Nesting Particle Filters for Experimental Design in Dynamical Systems
- Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction
- Network Tight Community Detection
- Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Features Model
- Neural Collapse in Multi-label Learning with Pick-all-label Loss
- Neural Collapse meets Differential Privacy: Curious behaviors of NoisyGD with Near-Perfect Representation Learning
- Neural Diffusion Models
- Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity
- NeuralIndicator: Implicit Surface Reconstruction from Neural Indicator Priors
- Neural Jump-Diffusion Temporal Point Processes
- Neural-Kernel Conditional Mean Embeddings
- Neural NeRF Compression
- Neural Networks Learn Statistics of Increasing Complexity
- Neural Operator Learning
- Neural operators meet conjugate gradients: The FCG-NO method for efficient PDE solving
- Neural Operators with Localized Integral and Differential Kernels
- Neural SPH: Improved Neural Modeling of Lagrangian Fluid Dynamics
- Neural Tangent Kernels for Axis-Aligned Tree Ensembles
- Neural Tangent Kernels Motivate Cross-Covariance Graphs in Neural Networks
- Neurodegenerative Brain Network Classification via Adaptive Diffusion with Temporal Regularization
- Neuroexplicit Diffusion Models for Inpainting of Optical Flow Fields
- Neuro-Symbolic Temporal Point Processes
- Neuro-Visualizer: A Novel Auto-Encoder-Based Loss Landscape Visualization Method With an Application in Knowledge-Guided Machine Learning
- New Bounds on the Cohesion of Complete-link and Other Linkage Methods for Agglomerative Clustering
- NeWRF: A Deep Learning Framework for Wireless Radiation Field Reconstruction and Channel Prediction
- New Sample Complexity Bounds for Sample Average Approximation in Heavy-Tailed Stochastic Programming
- NExT-Chat: An LMM for Chat, Detection and Segmentation
- Next Generation of AI Safety
- Next Generation of Sequence Modeling Architectures
- NExT-GPT: Any-to-Any Multimodal LLM
- NExT: Teaching Large Language Models to Reason about Code Execution
- No Dimensional Sampling Coresets for Classification
- No Double Descent in Principal Component Regression: A High-Dimensional Analysis
- No Free Prune: Information-Theoretic Barriers to Pruning at Initialization
- Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization
- Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning
- Non-Asymptotic Analysis for Single-Loop (Natural) Actor-Critic with Compatible Function Approximation
- Non-clairvoyant Scheduling with Partial Predictions
- Non-confusing Generation of Customized Concepts in Diffusion Models
- Non-convex Stochastic Composite Optimization with Polyak Momentum
- Nonlinear Filtering with Brenier Optimal Transport Maps
- Non-parametric Online Change Point Detection on Riemannian Manifolds
- Nonparametric Teaching of Implicit Neural Representations
- Nonsmooth Implicit Differentiation: Deterministic and Stochastic Convergence Rates
- Non-stationary Online Convex Optimization with Arbitrary Delays
- Non-Vacuous Generalization Bounds for Large Language Models
- No-Regret Reinforcement Learning in Smooth MDPs
- Not all distributional shifts are equal: Fine-grained robust conformal inference
- Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators
- Novel Spectral Algorithms for the Partial Credit Model
- No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths
- O$n$ Learning Deep O($n$)-Equivariant Hyperspheres
- OAK: Enriching Document Representations using Auxiliary Knowledge for Extreme Classification
- Observable Propagation: Uncovering Feature Vectors in Transformers
- ODIM: Outlier Detection via Likelihood of Under-Fitted Generative Models
- ODIN: Disentangled Reward Mitigates Hacking in RLHF
- Offline Actor-Critic Reinforcement Learning Scales to Large Models
- Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
- Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching
- Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms
- Offline Multi-Objective Optimization
- Offline Training of Language Model Agents with Functions as Learnable Weights
- Offline Transition Modeling via Contrastive Energy Learning
- Off-policy Evaluation Beyond Overlap: Sharp Partial Identification Under Smoothness
- OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning
- OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
- On a Combinatorial Problem Arising in Machine Teaching
- On a Neural Implementation of Brenier's Polar Factorization
- On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis
- On Convergence of Incremental Gradient for Non-convex Smooth Functions
- On dimensionality of feature vectors in MPNNs
- On Discrete Prompt Optimization for Diffusion Models
- One for All: A Universal Generator for Concept Unlearnability via Multi-Modal Alignment
- One Meta-tuned Transformer is What You Need for Few-shot Learning
- One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts
- One-Shot Strategic Classification Under Unknown Costs
- One Size Fits All for Semantic Shifts: Adaptive Prompt Tuning for Continual Learning
- On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box
- On Hypothesis Transfer Learning of Functional Linear Models
- On Interpolating Experts and Multi-Armed Bandits
- On Least Square Estimation in Softmax Gating Mixture of Experts
- Online Adaptive Anomaly Thresholding with Confidence Sequences
- Online Algorithms with Uncertainty-Quantified Predictions
- Online bipartite matching with imperfect advice
- Online Cascade Learning for Efficient Inference over Streams
- Online conformal prediction with decaying step sizes
- Online Isolation Forest
- Online Learning and Information Exponents: The Importance of Batch size & Time/Complexity Tradeoffs
- Online Learning in Betting Markets: Profit versus Prediction
- Online Learning in CMDPs: Handling Stochastic and Adversarial Constraints
- Online Learning under Budget and ROI Constraints via Weak Adaptivity
- Online Learning with Bounded Recall
- Online Linear Regression in Dynamic Environments via Discounting
- Online Matching with Stochastic Rewards: Provable Better Bound via Adversarial Reinforcement Learning
- Online Matrix Completion: A Collaborative Approach with Hott Items
- Online Non-stochastic Control with Partial Feedback
- Online Resource Allocation with Non-Stationary Customers
- Online Speculative Decoding
- Online Variational Sequential Monte Carlo
- On Mechanistic Knowledge Localization in Text-to-Image Generative Models
- On Multi-Armed Bandit with Impatient Arms
- On Online Experimentation without Device Identifiers
- On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization
- On Positivity Condition for Causal Inference
- On Prompt-Driven Safeguarding for Large Language Models
- On Statistical Learning Theory for Distributional Inputs
- On Stronger Computational Separations Between Multimodal and Unimodal Machine Learning
- On the Asymptotic Distribution of the Minimum Empirical Risk
- On the Calibration of Human Pose Estimation
- On the Complexity of Finite-Sum Smooth Optimization under the Polyak–Łojasiewicz Condition
- On The Complexity of First-Order Methods in Stochastic Bilevel Optimization
- On the Consistency of Kernel Methods with Dependent Observations
- On the Convergence of Projected Bures-Wasserstein Gradient Descent under Euclidean Strong Convexity
- On the Diminishing Returns of Width for Continual Learning
- On the Duality Between Sharpness-Aware Minimization and Adversarial Training
- On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning
- On the Embedding Collapse when Scaling up Recommendation Models
- On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm
- On the Error-Propagation of Inexact Hotelling's Deflation for Principal Component Analysis
- On the Expressive Power of Spectral Invariant Graph Neural Networks
- On The Fairness Impacts of Hardware Selection in Machine Learning
- On the Feasibility of Single-Pass Full-Capacity Learning in Linear Threshold Neurons with Binary Input Vectors
- On the Generalization of Equivariant Graph Neural Networks
- On the Generalization of Stochastic Gradient Descent with Momentum
- On the Hardness of Probabilistic Neurosymbolic Learning
- On the Identifiability of Switching Dynamical Systems
- On the Implicit Bias of Adam
- On the Independence Assumption in Neurosymbolic Learning
- On the Last-Iterate Convergence of Shuffling Gradient Methods
- On the Maximal Local Disparity of Fairness-Aware Classifiers
- On the Minimal Degree Bias in Generalization on the Unseen for non-Boolean Functions
- On the Nonlinearity of Layer Normalization
- On the Origins of Linear Representations in Large Language Models
- On the Recoverability of Causal Relations from Temporally Aggregated I.I.D. Data
- On the Role of Edge Dependency in Graph Generative Models
- On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control
- On the sample complexity of conditional independence testing with Von Mises estimator with application to causal discovery
- On the Second-Order Convergence of Biased Policy Gradient Algorithms
- On The Statistical Complexity of Offline Decision-Making
- On the Tractability of SHAP Explanations under Markovian Distributions
- On the Trajectory Regularity of ODE-based Diffusion Sampling
- On the Unexpected Effectiveness of Reinforcement Learning for Sequential Recommendation
- On the Universality of Volume-Preserving and Coupling-Based Normalizing Flows
- On the Weight Dynamics of Deep Normalized Networks
- On Universally Optimal Algorithms for A/B Testing
- On Which Nodes Does GCN Fail? Enhancing GCN From the Node Perspective
- OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift
- Open Ad Hoc Teamwork with Cooperative Game Theory
- Open-Domain Text Evaluation via Contrastive Distribution Methods
- OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
- Open-Vocabulary Calibration for Fine-tuned CLIP
- Operator SVD with Neural Networks via Nested Low-Rank Approximation
- Optimal Acceleration for Minimax and Fixed-Point Problems is Not Unique
- Optimal Batched Linear Bandits
- Optimal bounds for $\ell_p$ sensitivity sampling via $\ell_2$ augmentation
- Optimal Coresets for Low-Dimensional Geometric Median
- Optimal Differentially Private Model Training with Public Data
- Optimal Exact Recovery in Semi-Supervised Learning: A Study of Spectral Methods and Graph Convolutional Networks
- Optimal Eye Surgeon: Finding image priors through sparse generators at initialization
- Optimal Hessian/Jacobian-Free Nonconvex-PL Bilevel Optimization
- Optimal Kernel Choice for Score Function-based Causal Discovery
- Optimal Kernel Quantile Learning with Random Features
- Optimally Improving Cooperative Learning in a Social Setting
- Optimal Recurrent Network Topologies for Dynamical Systems Reconstruction
- Optimal Ridge Regularization for Out-of-Distribution Prediction
- Optimal Transport for Structure Learning Under Missing Data
- Optimistic Multi-Agent Policy Gradient
- Optimization without Retraction on the Random Generalized Stiefel Manifold
- Optimizing Watermarks for Large Language Models
- OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models
- Orthogonal Bootstrap: Efficient Simulation of Input Uncertainty
- OSN: Infinite Representations of Dynamic 3D Scenes from Monocular Videos
- OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization
- OT-CLIP: Understanding and Generalizing CLIP via Optimal Transport
- OTMatch: Improving Semi-Supervised Learning with Optimal Transport
- Outlier-aware Slicing for Post-Training Quantization in Vision Transformer
- Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
- Outlier-robust Kalman Filtering through Generalised Bayes
- Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
- Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble
- Out-of-Domain Generalization in Dynamical Systems Reconstruction
- Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift
- Overcoming Data and Model heterogeneities in Decentralized Federated Learning via Synthetic Anchors
- Overcoming Saturation in Density Ratio Estimation by Iterated Regularization
- Overcoming the Optimizer's Curse: Obtaining Realistic Prescriptions from Neural Networks
- Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning
- OxyGenerator: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning
- PAC-Bayesian Error Bound, via Rényi Divergence, for a Class of Linear Time-Invariant State-Space Models
- PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning
- PAGER: Accurate Failure Characterization in Deep Regression Models
- PairNet: Training with Observed Pairs to Estimate Individual Treatment Effect
- Pairwise Alignment Improves Graph Domain Adaptation
- PANDA: Expanded Width-Aware Message Passing Beyond Rewiring
- PAPM: A Physics-aware Proxy Model for Process Systems
- Parallel Affine Transformation Tuning of Markov Chain Monte Carlo
- Parallelized Spatiotemporal Slot Binding for Videos
- Parameter-Dependent Competitive Analysis for Online Capacitated Coverage Maximization through Boostings and Attenuations
- Parameter-Efficient Fine-Tuning with Controls
- Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
- Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation
- Parameter Estimation in DAGs from Incomplete Data via Optimal Transport
- Parameterized Physics-informed Neural Networks for Parameterized PDEs
- PARCv2: Physics-aware Recurrent Convolutional Neural Networks for Spatiotemporal Dynamics Modeling
- PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition
- Parsimonious Learning-Augmented Approximations for Dense Instances of $\mathcal{NP}$-hard Problems
- Partially Stochastic Infinitely Deep Bayesian Neural Networks
- Partial Multi-View Multi-Label Classification via Semantic Invariance Learning and Prototype Modeling
- Partial Optimality in the Linear Ordering Problem
- Particle Denoising Diffusion Sampler
- PASOA- PArticle baSed Bayesian Optimal Adaptive design
- Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
- Path-Guided Particle-based Sampling
- Pausing Policy Learning in Non-stationary Reinforcement Learning
- PcLast: Discovering Plannable Continuous Latent States
- PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming
- PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation
- Pedestrian Attribute Recognition as Label-balanced Multi-label Learning
- Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams
- PerceptAnon: Exploring the Human Perception of Image Anonymization Beyond Pseudonymization for GDPR
- Perfect Alignment May be Poisonous to Graph Contrastive Learning
- Performance Bounds for Active Binary Testing with Information Maximization
- Performative Prediction with Bandit Feedback: Learning through Reparameterization
- Perturb-and-Project: Differentially Private Similarities and Marginals
- Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
- PGODE: Towards High-quality System Dynamics Modeling
- PhAST: Physics-Aware, Scalable, and Task-Specific GNNs for Accelerated Catalyst Design
- Physics and Lie symmetry informed Gaussian processes
- Physics-Informed Neural Network Policy Iteration: Algorithms, Convergence, and Verification
- Physics of Language Models
- Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
- PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning
- PIDformer: Transformer Meets Control Theory
- PID: Prompt-Independent Data Protection Against Latent Diffusion Models
- Pi-DUAL: Using privileged information to distinguish clean from noisy labels
- Piecewise Constant and Linear Regression Trees: An Optimal Dynamic Programming Approach
- PinNet: Pinpoint Instructive Information for Retrieval Augmented Code-to-Text Generation
- PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
- PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
- PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer
- Planning, Fast and Slow: Online Reinforcement Learning with Action-Free Offline Data via Multiscale Planners
- Plug-and-Play image restoration with Stochastic deNOising REgularization
- Plug-in Performative Optimization
- Pluvial Flood Emulation with Hydraulics-informed Message Passing
- PointMC: Multi-instance Point Cloud Registration based on Maximal Cliques
- Policy-conditioned Environment Models are More Generalizable
- Policy Evaluation for Variance in Average Reward Reinforcement Learning
- Policy Learning for Balancing Short-Term and Long-Term Rewards
- Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks
- Polynomial-based Self-Attention for Table Representation Learning
- PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
- Position: $C^*$-Algebraic Machine Learning $-$ Moving in a New Direction
- Position: A Call for Embodied AI
- Position: A Call to Action for a Human-Centered AutoML Paradigm
- Position: AI/ML Influencers Have a Place in the Academic Process
- Position: AI-Powered Autonomous Weapons Risk Geopolitical Instability and Threaten AI Research
- Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning
- Position: Amazing Things Come From Having Many Good Models
- Position: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience
- Position: Application-Driven Innovation in Machine Learning
- Position: A Roadmap to Pluralistic Alignment
- Position: A Safe Harbor for AI Evaluation and Red Teaming
- Position: Automatic Environment Shaping is the Next Frontier in RL
- Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI
- Position: Benchmarking is Limited in Reinforcement Learning Research
- Position: Beyond Personhood: Agency, Accountability, and the Limits of Anthropomorphic Ethical Analysis
- Position: Building Guardrails for Large Language Models Requires Systematic Design
- Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
- Position: Compositional Generative Modeling: A Single Model is Not All You Need
- Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining
- Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities
- Position: Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?
- Position: Data-driven Discovery with Large Generative Models
- Position: Do Not Explain Vision Models Without Context
- Position: Do pretrained Transformers Learn In-Context by Gradient Descent?
- Position: Embracing Negative Results in Machine Learning
- Position: Enforced Amnesia as a Way to Mitigate the Potential Risk of Silent Suffering in the Conscious AI
- Position: Evolving AI Collectives Enhance Human Diversity and Enable Self-Regulation
- Position: Explain to Question not to Justify
- Position: Exploring the Robustness of Pipeline-Parallelism-Based Decentralized Training
- Position: Foundation Agents as the Paradigm Shift for Decision Making
- Position: Fundamental Limitations of LLM Censorship Necessitate New Approaches
- Position: Future Directions in the Theory of Graph Machine Learning
- Position: Graph Foundation Models Are Already Here
- Position: Insights from Survey Methodology can Improve Training Data
- Position: Intent-aligned AI Systems Must Optimize for Agency Preservation
- Position: Is machine learning good or bad for the natural sciences?
- Position: Key Claims in LLM Research Have a Long Tail of Footnotes
- Position: Levels of AGI for Operationalizing Progress on the Path to AGI
- Position: Leverage Foundational Models for Black-Box Optimization
- Position: LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks
- Position: Machine Learning-powered Assessments of the EU Digital Services Act Aid Quantify Policy Impacts on Online Harms
- Position: Measure Dataset Diversity, Don't Just Claim It
- Position: Mission Critical – Satellite Data is a Distinct Modality in Machine Learning
- Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI
- Position: On the Possibilities of AI-Generated Text Detection
- Position: On the Societal Impact of Open Foundation Models
- Position: Open-Endedness is Essential for Artificial Superhuman Intelligence
- Position: Opportunities Exist for Machine Learning in Magnetic Fusion Energy
- Position: Optimization in SciML Should Employ the Function Space Geometry
- Position: Quo Vadis, Unsupervised Time Series Anomaly Detection?
- Position: Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination
- Position: Relational Deep Learning - Graph Representation Learning on Relational Databases
- Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems
- Position: Scaling Simulation is Neither Necessary Nor Sufficient for In-the-Wild Robot Manipulation
- Position: Scarce Resource Allocations That Rely On Machine Learning Should Be Randomized
- Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
- Position: Social Environment Design Should be Further Developed for AI-based Policy-Making
- Position: Standardization of Behavioral Use Clauses is Necessary for the Adoption of Responsible Licensing of AI
- Position: Stop Making Unscientific AGI Performance Claims
- Position: Technical Research and Talent is Needed for Effective AI Governance
- Position: Tensor Networks are a Valuable Asset for Green AI
- Position: The Causal Revolution Needs Scientific Pragmatism
- Position: The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning
- Position: The Platonic Representation Hypothesis
- Position: The Reasonable Person Standard for AI
- Position: Topological Deep Learning is the New Frontier for Relational Learning
- Position: Towards Implicit Prompt For Text-To-Image Models
- Position: Towards Unified Alignment Between Agents, Humans, and Environment
- Position: TrustLLM: Trustworthiness in Large Language Models
- Position: Understanding LLMs Requires More Than Statistical Generalization
- Position: Video as the New Language for Real-World Decision Making
- Position: What Can Large Language Models Tell Us about Time Series Analysis
- Position: What makes an image realistic?
- Position: Why Tabular Foundation Models Should Be a Research Priority
- Position: Why We Must Rethink Empirical Research in Machine Learning
- Position: Will we run out of data? Limits of LLM scaling based on human-generated data
- Positive and Unlabeled Learning with Controlled Probability Boundary Fence
- Positive Concave Deep Equilibrium Models
- Posterior Sampling-Based Bayesian Optimization with Tighter Bayesian Regret Bounds
- Post-hoc Part-Prototype Networks
- Potential Based Diffusion Motion Planning
- PPFLOW: Target-Aware Peptide Design with Torsional Flow Matching
- Practical Hamiltonian Monte Carlo on Riemannian Manifolds via Relativity Theory
- Practical Performance Guarantees for Pipelined DNN Inference
- Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input
- Precise Accuracy / Robustness Tradeoffs in Regression: Case of General Norms
- Predicting and Interpreting Energy Barriers of Metallic Glasses with Graph Neural Networks
- Predicting Dose-Response Curves with Deep Neural Networks
- Predicting Lagrangian Multipliers for Mixed Integer Linear Programs
- Prediction Accuracy of Learning in Games : Follow-the-Regularized-Leader meets Heisenberg
- Prediction-powered Generalization of Causal Inferences
- Predictive Coding beyond Correlations
- Predictive Dynamic Fusion
- Predictive Linear Online Tracking for Unknown Targets
- Predictive Performance Comparison of Decision Policies Under Confounding
- Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
- Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models
- Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss
- Premise Order Matters in Reasoning with Large Language Models
- PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs
- Pre-Training Protein Bi-level Representation Through Span Mask Strategy On 3D Protein Chains
- Preventing Model Collapse in Gaussian Process Latent Variable Models
- Pricing with Contextual Elasticity and Heteroscedastic Valuation
- Principled Gradient-Based MCMC for Conditional Sampling of Text
- Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
- Principled Preferential Bayesian Optimization
- PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses
- Prior Mismatch and Adaptation in PnP-ADMM with a Nonconvex Convergence Analysis
- Prior Specification for Bayesian Matrix Factorization via Prior Predictive Matching
- PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control
- Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
- Privacy Attacks in Decentralized Learning
- Privacy Backdoors: Stealing Data with Corrupted Pretrained Models
- Privacy Preserving Adaptive Experiment Design
- Privacy-Preserving Data Release Leveraging Optimal Transport and Particle Gradient Descent
- Privacy-Preserving Embedding via Look-up Table Evaluation with Fully Homomorphic Encryption
- Privacy-Preserving Instructions for Aligning Large Language Models
- Privacy Profiles for Private Selection
- Private and Federated Stochastic Convex Optimization: Efficient Strategies for Centralized Systems
- Private Gradient Descent for Linear Regression: Tighter Error Bounds and Instance-Specific Uncertainty Estimation
- Private Heterogeneous Federated Learning Without a Trusted Server Revisited: Error-Optimal and Communication-Efficient Algorithms for Convex Losses
- Privately Learning Smooth Distributions on the Hypercube by Projections
- Private Truly-Everlasting Robust-Prediction
- Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages
- Proactive Detection of Voice Cloning with Localized Watermarking
- Proactive DP: A Multiple Target Optimization Framework for DP-SGD
- Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models
- Probabilistic Constrained Reinforcement Learning with Formal Interpretability
- Probabilistic Forecasting with Stochastic Interpolants and Föllmer Processes
- Probabilistic Generating Circuits - Demystified
- Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo
- Probabilistic Modeling of Interpersonal Coordination Processes
- Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search
- Probabilistic Subgoal Representations for Hierarchical Reinforcement Learning
- Probabilistic Time Series Modeling with Decomposable Denoising Diffusion Model
- Probability Distribution of Hypervolume Improvement in Bi-objective Bayesian Optimization
- Prodigy: An Expeditiously Adaptive Parameter-Free Learner
- Profile Reconstruction from Private Sketches
- Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions
- Projecting Molecules into Synthesizable Chemical Spaces
- Projection-Free Online Convex Optimization with Time-Varying Constraints
- Projection-Free Variance Reduction Methods for Stochastic Constrained Multi-Level Compositional Optimization
- Prometheus: Out-of-distribution Fluid Dynamics Modeling with Disentangled Graph ODE
- Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines
- Promoting External and Internal Equities Under Ex-Ante/Ex-Post Metrics in Online Resource Allocation
- Prompt-based Visual Alignment for Zero-shot Policy Transfer
- Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution
- Prompt-guided Precise Audio Editing with Diffusion Models
- Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
- Prompting a Pretrained Transformer Can Be a Universal Approximator
- Prompting is a Double-Edged Sword: Improving Worst-Group Robustness of Foundation Models
- Prompt Sketching for Large Language Models
- Prompt-tuning Latent Diffusion Models for Inverse Problems
- Prospective Side Information for Latent MDPs
- Prospector Heads: Generalized Feature Attribution for Large Models & Data
- Protein Conformation Generation via Force-Guided SE(3) Diffusion Models
- Proteus: Exploring Protein Structure Generation for Enhanced Designability and Efficiency
- ProtoGate: Prototype-based Neural Networks with Global-to-local Feature Selection for Tabular Biomedical Data
- Prototypical Transformer As Unified Motion Learners
- Provable Benefits of Local Steps in Heterogeneous Federated Learning for Neural Networks: A Feature Learning Perspective
- Provable Contrastive Continual Learning
- Provable Interactive Learning with Hindsight Instruction Feedback
- Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks
- Provable Privacy with Non-Private Pre-Processing
- Provable Representation with Efficient Planning for Partially Observable Reinforcement Learning
- Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation
- Provably Better Explanations with Optimized Aggregation of Feature Attributions
- Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret
- Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization
- Provably Efficient Partially Observable Risk-sensitive Reinforcement Learning with Hindsight Observation
- Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback
- Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples
- Provably Robust DPO: Aligning Language Models with Noisy Feedback
- Provably Scalable Black-Box Variational Inference with Structured Variational Families
- Pruned Pivot: Correlation Clustering Algorithm for Dynamic, Parallel, and Local Computation Models
- PruNeRF: Segment-Centric Dataset Pruning via 3D Spatial Consistency
- Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models
- Pseudo-Calibration: Improving Predictive Uncertainty Estimation in Unsupervised Domain Adaptation
- Purifying Quantization-conditioned Backdoors via Layer-wise Activation Correction with Distribution Approximation
- Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders
- Pursuing Overall Welfare in Federated Learning through Sequential Decision Making
- Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels
- QBMK: Quantum-based Matching Kernels for Un-attributed Graphs
- QORA: Zero-Shot Transfer via Interpretable Object-Relational Model Learning
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
- Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgent
- Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics
- Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization
- Quality-Diversity with Limited Resources
- Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design
- Quantum Algorithm for Online Exp-concave Optimization
- Quantum Algorithms and Lower Bounds for Finite-Sum Optimization
- Quantum Implicit Neural Representations
- Quantum Positional Encodings for Graph Neural Networks
- Quantum Theory and Application of Contextual Optimal Transport
- Quasi-Monte Carlo Features for Kernel Approximation
- QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference
- QuIP$\#$: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
- QuRating: Selecting High-Quality Data for Training Language Models
- Q-value Regularized Transformer for Offline Reinforcement Learning
- R2E: Turning any Github Repository into a Programming Agent Environment
- Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency
- Random features models: a way to study the success of naive imputation
- Randomized Confidence Bounds for Stochastic Partial Monitoring
- Random Latent Exploration for Deep Reinforcement Learning
- Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning
- Random matrix theory improved Fréchet mean of symmetric positive definite matrices
- Random Scaling and Momentum for Non-smooth Non-convex Optimization
- Ranking-based Client Imitation Selection for Efficient Federated Learning
- RankSEG: A Consistent Ranking-based Framework for Segmentation
- Rapid Learning without Catastrophic Forgetting in the Morris Water Maze
- Rate-Optimal Policy Optimization for Linear Markov Decision Processes
- RAUCA: A Novel Physical Adversarial Attack on Vehicle Detectors via Robust and Accurate Camouflage Generation
- Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization
- Reason for Future, Act for Now: A Principled Architecture for Autonomous LLM Agents
- Receptive Fields As Experts in Convolutional Neural Architectures
- ReconBoost: Boosting Can Achieve Modality Reconcilement
- Recovering Labels from Local Updates in Federated Learning
- Recovering the Pre-Fine-Tuning Weights of Generative Models
- Recurrent Distance Filtering for Graph Representation Learning
- Recurrent Early Exits for Federated Learning with Heterogeneous Clients
- ReDiffuser: Reliable Decision-Making Using a Diffuser with Confidence Estimation
- Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion Bridge
- Reducing Balancing Error for Causal Inference via Optimal Transport
- Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
- Reducing Item Discrepancy via Differentially Private Robust Embedding Alignment for Privacy-Preserving Cross Domain Recommendation
- Reducing sequential change detection to sequential estimation
- Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion
- Reference Neural Operators: Learning the Smooth Dependence of Solutions of PDEs on Geometric Deformations
- Refined Coreset Selection: Towards Minimal Coreset Size under Model Performance Constraints
- Refining Minimax Regret for Unsupervised Environment Design
- Reflected Flow Matching
- Reflective Policy Optimization
- ReGAL: Refactoring Programs to Discover Generalizable Abstractions
- Regression Learning with Limited Observations of Multivariate Outcomes and Features
- Regression with Multi-Expert Deferral
- Regularized Q-learning through Robust Averaging
- Regularizing with Pseudo-Negatives for Continual Self-Supervised Learning
- Reinforcement Learning and Regret Bounds for Admission Control
- Reinforcement Learning from Reachability Specifications: PAC Guarantees with Expected Conditional Distance
- Reinforcement Learning within Tree Search for Fast Macro Placement
- Reinformer: Max-Return Sequence Modeling for Offline RL
- Rejuvenating image-GPT as Strong Visual Representation Learners
- Relational DNN Verification With Cross Executional Bound Refinement
- Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective
- Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise
- Relaxing the Accurate Imputation Assumption in Doubly Robust Learning for Debiased Collaborative Filtering
- ReLU Network with Width $d+\mathcal{O}(1)$ Can Achieve Optimal Approximation Rate
- ReLUs Are Sufficient for Learning Implicit Neural Representations
- ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages
- ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
- REMEDI: Corrective Transformations for Improved Neural Entropy Estimation
- Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making
- Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation
- Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration via Shift Reduction Lemmas
- Reparameterized Importance Sampling for Robust Variational Bayesian Neural Networks
- Repeat After Me: Transformers are Better than State Space Models at Copying
- Replicable Learning of Large-Margin Halfspaces
- Repoformer: Selective Retrieval for Repository-Level Code Completion
- Representation Surgery for Multi-Task Model Merging
- Representation Surgery: Theory and Practice of Affine Steering
- Representing Molecules as Random Walks Over Interpretable Grammars
- Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling
- Reservoir Computing for Short High-Dimensional Time Series: an Application to SARS-CoV-2 Hospitalization Forecast
- Reshape and Adapt for Output Quantization (RAOQ): Quantization-aware Training for In-memory Computing Systems
- Residual-Conditioned Optimal Transport: Towards Structure-Preserving Unpaired and Paired Image Restoration
- Residual Quantization with Implicit Neural Codebooks
- Resisting Stochastic Risks in Diffusion Planners with the Trajectory Aggregation Tree
- REST: Efficient and Accelerated EEG Seizure Analysis through Residual State Updates
- Restoring balance: principled under/oversampling of data for optimal classification
- Rethinking Adversarial Robustness in the Context of the Right to be Forgotten
- Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits
- Rethinking Decision Transformer via Hierarchical Reinforcement Learning
- Rethinking DP-SGD in Discrete Domain: Exploring Logistic Distribution in the Realm of signSGD
- Rethinking Generative Large Language Model Evaluation for Semantic Comprehension
- Rethinking Guidance Information to Utilize Unlabeled Samples: A Label Encoding Perspective
- Rethinking Independent Cross-Entropy Loss For Graph-Structured Data
- Rethinking Momentum Knowledge Distillation in Online Continual Learning
- Rethinking Optimization and Architecture for Tiny Language Models
- Rethinking Specificity in SBDD: Leveraging Delta Score and Energy-Guided Diffusion
- Rethinking the Flat Minima Searching in Federated Learning
- Rethinking Transformers in Solving POMDPs
- Retrieval Across Any Domains via Large-scale Pre-trained Model
- Retrieval-Augmented Score Distillation for Text-to-3D Generation
- Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness
- Revealing Vision-Language Integration in the Brain with Multimodal Networks
- Revisiting Character-level Adversarial Attacks for Language Models
- Revisiting Context Aggregation for Image Matting
- Revisiting Inexact Fixed-Point Iterations for Min-Max Problems: Stochasticity and Structured Nonconvexity
- Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning
- Revisiting the Power of Prompt for Visual Tuning
- Revisiting the Role of Language Priors in Vision-Language Models
- Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark
- Revisit the Essence of Distilling Knowledge through Calibration
- Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling
- Reward-Free Kernel-Based Reinforcement Learning
- Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
- Reward Shaping for Reinforcement Learning with An Assistant Reward Agent
- Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
- Reweighted Solutions for Weighted Low Rank Approximation
- RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation
- Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
- Riemannian Accelerated Zeroth-order Algorithm: Improved Robustness and Lower Query Complexity
- Riemannian coordinate descent algorithms on matrix manifolds
- Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models
- RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
- RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
- Risk Aware Benchmarking of Large Language Models
- Risk Estimation in a Markov Cost Process: Lower and Upper Bounds
- Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient
- Risk-Sensitive Reward-Free Reinforcement Learning with CVaR
- RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
- RL-CFR: Improving Action Abstraction for Imperfect Information Extensive-Form Games with Reinforcement Learning
- RLVF: Learning from Verbal Feedback without Overgeneralization
- RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback
- RMIB: Representation Matching Information Bottleneck for Matching Text Representations
- RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching
- RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
- RoboDreamer: Learning Compositional World Models for Robot Imagination
- RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
- RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models
- Robust and Conjugate Gaussian Process Regression
- Robust Classification via a Single Diffusion Model
- Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
- Robust Data-driven Prescriptiveness Optimization
- Robust Graph Matching when Nodes are Corrupt
- Robust Inverse Constrained Reinforcement Learning under Model Misspecification
- Robust Inverse Graphics via Probabilistic Inference
- Robust Learning-Augmented Dictionaries
- Robustly Learning Single-Index Models via Alignment Sharpness
- Robust Multi-Task Learning with Excess Risks
- Robustness of Deep Learning for Accelerated MRI: Benefits of Diverse Training Data
- Robustness of Nonlinear Representation Learning
- Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space
- Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination
- Robust Stable Spiking Neural Networks
- Robust Universal Adversarial Perturbations
- Robust Yet Efficient Conformal Prediction Sets
- RODEO: Robust Outlier Detection via Exposing Adaptive Out-of-Distribution Samples
- Rolling Diffusion Models
- Roping in Uncertainty: Robustness and Regularization in Markov Games
- RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
- Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
- Run-Time Task Composition with Safety Semantics
- RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning
- S$\Omega$I: Score-based O-INFORMATION Estimation
- S3GCL: Spectral, Swift, Spatial Graph Contrastive Learning
- S3O: A Dual-Phase Approach for Reconstructing Dynamic Shape and Skeleton of Articulated Objects from Single Monocular Video
- Safe and Robust Subgame Exploitation in Imperfect Information Games
- Safe Exploration in Dose Finding Clinical Trials with Heterogeneous Participants
- Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
- Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
- Saliency strikes back: How filtering out high frequencies improves white-box explanations
- SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation
- SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation
- SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention
- Sample as you Infer: Predictive Coding with Langevin Dynamics
- Sample Average Approximation for Conditional Stochastic Optimization with Dependent Data
- Sample Complexity Bounds for Estimating Probability Divergences under Invariances
- Sample-Efficient Multiagent Reinforcement Learning with Reset Replay
- Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty
- Sample-specific Masks for Visual Reprogramming-based Prompting
- Sampling-based Multi-dimensional Recalibration
- Sampling in Unit Time with Kernel Fisher-Rao Flow
- Sampling is as easy as keeping the consistency: convergence guarantee for Consistency Models
- SAPG: Split and Aggregate Policy Gradients
- Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features
- SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
- Scalable AI Safety via Doubly-Efficient Debate
- Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency
- Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
- Scalable Multiple Kernel Clustering: Learning Clustering Structure from Expectation
- Scalable Online Exploration via Coverability
- Scalable Pre-training of Large Autoregressive Image Models
- Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks
- Scalable Safe Policy Improvement for Factored Multi-Agent MDPs
- Scalable Wasserstein Gradient Flow for Generative Modeling through Unbalanced Optimal Transport
- Scale-Free Image Keypoints Using Differentiable Persistent Homology
- Scaling Beyond the GPU Memory Limit for Large Mixture-of-Experts Model Training
- Scaling Down Deep Learning with MNIST-1D
- Scaling Exponents Across Parameterizations and Optimizers
- Scaling Laws for Fine-Grained Mixture of Experts
- Scaling Laws for the Value of Individual Data Points in Machine Learning
- Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
- Scaling Speech Technology to 1,000+ Languages
- Scaling Tractable Probabilistic Circuits: A Systems Perspective
- SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code
- Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency
- SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
- Score-Based Causal Discovery of Latent Variable Causal Models
- Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation
- SCoRe: Submodular Combinatorial Representation Learning
- Scribble-Supervised Semantic Segmentation with Prototype-based Feature Augmentation
- Second-Order Uncertainty Quantification: A Distance-Based Approach
- See More Details: Efficient Image Super-Resolution by Experts Mining
- Seesaw: Compensating for Nonlinear Reduction with Linear Computations for Private Inference
- Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic
- Selecting Large Language Model to Fine-tune via Rectified Scaling Law
- Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup
- Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
- Self-attention Networks Localize When QK-eigenspectrum Concentrates
- Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes
- Self-cognitive Denoising in the Presence of Multiple Noisy Label Sources
- Self-Composing Policies for Scalable Continual Reinforcement Learning
- Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction
- Self-Correcting Self-Consuming Loops for Generative Model Training
- Self-Driven Entropy Aggregation for Byzantine-Robust Heterogeneous Federated Learning
- SelfIE: Self-Interpretation of Large Language Model Embeddings
- Self-Infilling Code Generation
- Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
- Self-Rewarding Language Models
- Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation
- Self-Supervised Interpretable End-to-End Learning via Latent Functional Modularity
- SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
- SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching
- Semantically-correlated memories in a dense associative model
- Semantic-Aware Human Object Interaction Image Generation
- SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets
- Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning
- Sequential Asynchronous Action Coordination in Multi-Agent Systems: A Stackelberg Decision Transformer Approach
- Sequential Disentanglement by Extracting Static Information From A Single Sequence Element
- Sequential Kernel Goodness-of-fit Testing
- Sequential Neural Score Estimation: Likelihood-Free Inference with Conditional Score Based Diffusion Models
- SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic
- SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning
- Sharpness-Aware Data Generation for Zero-shot Quantization
- Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss
- Shifted Interpolation for Differential Privacy
- SHINE: Shielding Backdoors in Deep Reinforcement Learning
- Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
- Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs
- SiBBlInGS: Similarity-driven Building-Block Inference using Graphs across States
- Sign Gradient Descent-based Neuronal Dynamics: ANN-to-SNN Conversion Beyond ReLU Network
- Sign is Not a Remedy: Multiset-to-Multiset Message Passing for Learning on Heterophilic Graphs
- Sign Rank Limitations for Inner Product Graph Decoders
- SignSGD with Federated Defense: Harnessing Adversarial Attacks through Gradient Sign Decoding
- SILVER: Single-loop variance reduction and application to federated learning
- Simple Ingredients for Offline Reinforcement Learning
- Simple linear attention language models balance the recall-throughput tradeoff
- Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data
- Simplicity Bias via Global Convergence of Sharpness Minimization
- SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning
- Simulation-Based Inference with Quantile Regression
- Simulation of Graph Algorithms with Looped Transformers
- Simultaneous identification of models and parameters of scientific simulators
- Single-Model Attribution of Generative Models Through Final-Layer Inversion
- Single-Trajectory Distributionally Robust Reinforcement Learning
- SIN: Selective and Interpretable Normalization for Long-Term Time Series Forecasting
- SiT: Symmetry-invariant Transformers for Generalisation in Reinforcement Learning
- Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection
- Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills
- SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
- SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
- SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals
- Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices
- Sliced-Wasserstein Estimation with Spherical Harmonics as Control Variates
- Sliced Wasserstein with Random-Path Projecting Directions
- Slicing Mutual Information Generalization Bounds for Neural Networks
- Sliding Down the Stairs: How Correlated Latent Variables Accelerate Learning with Neural Networks
- SLOG: An Inductive Spectral Graph Neural Network Beyond Polynomial Filter
- Slot Abstractors: Toward Scalable Abstract Visual Reasoning
- Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks
- Small-loss Adaptive Regret for Online Convex Optimization
- SMaRt: Improving GANs with Score Matching Regularity
- Smoothing Proximal Gradient Methods for Nonsmooth Sparsity Constrained Optimization: Optimality Conditions and Global Convergence
- Smooth Min-Max Monotonic Networks
- Smoothness Adaptive Hypothesis Transfer Learning
- Smooth Tchebycheff Scalarization for Multi-Objective Optimization
- Sobolev Space Regularised Pre Density Models
- Socialized Learning: Making Each Other Better Through Multi-Agent Collaboration
- Soft Prompt Recovers Compressed LLMs, Transferably
- Solving Hierarchical Information-Sharing Dec-POMDPs: An Extensive-Form Game Approach
- Solving Poisson Equations using Neural Walk-on-Spheres
- SPABA: A Single-Loop and Probabilistic Stochastic Bilevel Algorithm Achieving Optimal Sample Complexity
- SPADE: Sparsity-Guided Debugging for Deep Neural Networks
- SparQ Attention: Bandwidth-Efficient LLM Inference
- Sparse and Structured Hopfield Networks
- Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once
- Sparse Dimensionality Reduction Revisited
- Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
- Sparse Inducing Points in Deep Gaussian Processes: Enhancing Modeling with Denoising Diffusion Variational Inference
- Sparse is Enough in Fine-tuning Pre-trained Large Language Models
- Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications
- Sparser, Better, Deeper, Stronger: Improving Static Sparse Training with Exact Orthogonal Initialization
- Sparsest Models Elude Pruning: An Exposé of Pruning’s Current Capabilities
- Sparse-to-dense Multimodal Image Registration via Multi-Task Learning
- SparseTSF: Modeling Long-term Time Series Forecasting with *1k* Parameters
- Spectral Phase Transition and Optimal PCA in Block-Structured Spiked Models
- Spectral Preconditioning for Gradient Methods on Graded Non-convex Functions
- Speech Self-Supervised Learning Using Diffusion Model Synthetic Data
- SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
- Spider: A Unified Framework for Context-dependent Concept Segmentation
- Spike Distance Function as a Learning Objective for Spike Prediction
- SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms
- SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN
- Split-and-Denoise: Protect large language model inference with local differential privacy
- Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting
- Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
- SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
- SqueezeLLM: Dense-and-Sparse Quantization
- SSL4Q: Semi-Supervised Learning of Quantum Data with Application to Quantum State Classification
- Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations
- Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms
- Stability and Multigroup Fairness in Ranking with Uncertain Predictions
- Stability Evaluation through Distributional Perturbation Analysis
- Stability-Informed Initialization of Neural Ordinary Differential Equations
- Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process
- Stable Differentiable Causal Discovery
- StableMask: Refining Causal Masking in Decoder-only Transformer
- StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
- Stacking Deep Set Networks and Pooling by Quantiles
- StackSight: Unveiling WebAssembly through Large Language Models and Neurosymbolic Chain-of-Thought Decompilation
- Standardized Interpretable Fairness Measures for Continuous Risk Scores
- State-Constrained Zero-Sum Differential Games with One-Sided Information
- State-Free Inference of State-Space Models: The *Transfer Function* Approach
- Stationarity without mean reversion in improper Gaussian processes
- Stationary Latent Weight Inference for Unreliable Observations from Online Test-Time Adaptation
- Statistical Inference Under Constrained Selection Bias
- Statistically Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution
- Statistical Properties of Robust Satisficing
- Statistical Test for Attention Maps in Vision Transformers
- Stay on Topic with Classifier-Free Guidance
- Stealing part of a production language model
- Stealthy Imitation: Reward-guided Environment-free Policy Stealing
- STEER: Assessing the Economic Rationality of Large Language Models
- STELLA: Continual Audio-Video Pre-training with SpatioTemporal Localized Alignment
- Stereographic Spherical Sliced Wasserstein Distances
- Stereo Risk: A Continuous Modeling Approach to Stereo Matching
- Stochastic Bandits with ReLU Neural Networks
- Stochastic Conditional Diffusion Models for Robust Semantic Image Synthesis
- Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features
- Stochastic Interpolants with Data-Dependent Couplings
- Stochastic Localization via Iterative Posterior Sampling
- Stochastic Optimization with Arbitrary Recurrent Data Sampling
- Stochastic positional embeddings improve masked image modeling
- Stochastic Q-learning for Large Discrete Action Spaces
- Stochastic Quantum Sampling for Non-Logconcave Distributions and Estimating Partition Functions
- Stochastic Weakly Convex Optimization beyond Lipschitz Continuity
- Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
- Straight-Through Meets Sparse Recovery: the Support Exploration Algorithm
- Strategic ML: How to Learn With Data That ‘Behaves’
- StrokeNUWA—Tokenizing Strokes for Vector Graphic Synthesis
- Structure-Aware E(3)-Invariant Molecular Conformer Aggregation Networks
- Structure-based drug design by denoising voxel grids
- Structured Chemistry Reasoning with Large Language Models
- Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC
- Structured Probabilistic Inference and Generative Modeling
- Structure Your Data: Towards Semantic Graph Counterfactuals
- StrWAEs to Invariant Representations
- Studying K-FAC Heuristics by Viewing Adam through a Second-Order Lens
- StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization
- Subequivariant Reinforcement Learning in 3D Multi-Entity Physical Environments
- Subgoal-based Demonstration Learning for Formal Theorem Proving
- Subgraphormer: Unifying Subgraph GNNs and Graph Transformers via Graph Products
- Subhomogeneous Deep Equilibrium Models
- Submodular framework for structured-sparse optimal transport
- Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimisation
- Sub-token ViT Embedding via Stochastic Resonance Transformers
- Successor Features for Efficient Multi-Subject Controlled Text Generation
- SuDA: Support-based Domain Adaptation for Sim2Real Hinge Joint Tracking with Flexible Sensors
- Superpoint Gaussian Splatting for Real-Time High-Fidelity Dynamic Scene Reconstruction
- Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation
- Supervised Matrix Factorization: Local Landscape Analysis and Applications
- Surface-VQMAE: Vector-quantized Masked Auto-encoders on Molecular Surfaces
- SurfPro: Functional Protein Design Based on Continuous Surface
- Surprisingly Strong Performance Prediction with Neural Graph Features
- Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee
- Swallowing the Bitter Pill: Simplified Scalable Conformer Generation
- Switchable Decision: Dynamic Neural Generation Networks
- Switched Flow Matching: Eliminating Singularities via Switching ODEs
- Switching the Loss Reduces the Cost in Batch Reinforcement Learning
- SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
- Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion
- Symmetric Matrix Completion with ReLU Sampling
- Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization
- Symmetry Induces Structure and Constraint of Learning
- Synergistic Integration of Coordinate Network and Tensorial Feature for Improving Neural Radiance Fields from Sparse Inputs
- TabLog: Test-Time Adaptation for Tabular Data Using Logic Rules
- Tabular Insights, Visual Impacts: Transferring Expertise from Tables to Images
- Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation
- Tackling Prevalent Conditions in Unsupervised Combinatorial Optimization: Cardinality, Minimum, Covering, and More
- Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
- Tandem Transformers for Inference Efficient LLMs
- Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation
- Task-aware Orthogonal Sparse Network for Exploring Shared Knowledge in Continual Learning
- Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models
- Taylor Videos for Action Recognition
- T-Cal: An Optimal Test for the Calibration of Predictive Models
- Tell, Don't Show: Language Guidance Eases Transfer Across Domains in Images and Videos
- Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning
- Temporal Spiking Neural Networks with Synaptic Delay for Graph Reasoning
- TENG: Time-Evolving Natural Gradient for Solving PDEs With Deep Neural Nets Toward Machine Precision
- TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors
- Testing the Feasibility of Linear Programs with Bandit Feedback
- Test-Time Degradation Adaptation for Open-Set Image Restoration
- Test-Time Model Adaptation with Only Forward Passes
- Test-Time Regret Minimization in Meta Reinforcement Learning
- Text, camera, action! Frontiers in controllable video generation
- The Balanced-Pairwise-Affinities Feature Transform
- The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents
- The Computational Complexity of Finding Second-Order Stationary Points
- The Effect of Weight Precision on the Neuron Count in Deep ReLU Networks
- The effects of digital technology on youth development in low-and-middle-income countries
- The Emergence of Reproducibility and Consistency in Diffusion Models
- The Entropy Enigma: Success and Failure of Entropy Minimization
- The Expressive Power of Path-Based Graph Neural Networks
- The Fundamental Limits of Least-Privilege Learning
- The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective
- The Good, The Bad, and Why: Unveiling Emotions in Generative AI
- The Illusion of State in State-Space Models
- The Linear Representation Hypothesis and the Geometry of Large Language Models
- The Max-Min Formulation of Multi-Objective Reinforcement Learning: From Theory to a Model-Free Algorithm
- The Merit of River Network Topology for Neural Flood Forecasting
- The Non-linear $F$-Design and Applications to Interactive Learning
- Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability
- Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians
- Theoretical insights for diffusion guidance: A case study for Gaussian mixture models
- Theory of Consistency Diffusion Models: Distribution Estimation Meets Fast Sampling
- The Perception-Robustness Tradeoff in Deterministic Image Restoration
- The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks
- The Pitfalls of Next-Token Prediction
- The Privacy Power of Correlated Noise in Decentralized Learning
- The Relative Value of Prediction in Algorithmic Decision Making
- Thermometer: Towards Universal Calibration for Large Language Models
- The Role of Learning Algorithms in Collective Action
- The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright BreachesWithout Adjusting Finetuning Pipeline
- The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling
- The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
- Think Before You Act: Decision Transformers with Working Memory
- TIC-TAC: A Framework For Improved Covariance Estimation In Deep Heteroscedastic Regression
- Tight Partial Identification of Causal Effects with Marginal Distribution of Unmeasured Confounders
- Tilt and Average : Geometric Adjustment of the Last Layer for Recalibration
- Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks
- Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers
- TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning
- Timer: Generative Pre-trained Transformers Are Large Time Series Models
- Time Series Diffusion in the Frequency Domain
- Time-Series Forecasting for Out-of-Distribution Generalization Using Invariant Learning
- TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling
- Time Weaver: A Conditional Time Series Generation Model
- TimeX++: Learning Time-Series Explanations with Information Bottleneck
- tinyBenchmarks: evaluating LLMs with fewer examples
- TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge
- tnGPS: Discovering Unknown Tensor Network Structure Search Algorithms via Large Language Models (LLMs)
- To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO
- To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models
- Token-level Direct Preference Optimization
- Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models
- Topological Neural Networks go Persistent, Equivariant, and Continuous
- Total Variation Distance Meets Probabilistic Inference
- Total Variation Floodgate for Variable Importance Inference in Classification
- To the Max: Reinventing Reward in Reinforcement Learning
- Toward Adaptive Reasoning in Large Language Models with Thought Rollback
- Toward Availability Attacks in 3D Point Clouds
- Towards a Better Theoretical Understanding of Independent Subnetwork Training
- Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model
- Towards a Self-contained Data-driven Global Weather Forecasting Framework
- Towards AutoAI: Optimizing a Machine Learning System with Black-box and Differentiable Components
- Towards Causal Foundation Model: on Duality between Optimal Balancing and Attention
- Towards Certified Unlearning for Deep Neural Networks
- Towards Compositionality in Concept Learning
- Towards efficient deep spiking neural networks construction with spiking activity based pruning
- Towards Efficient Exact Optimization of Language Model Alignment
- Towards Efficient Generative Large Language Model Serving: A Tutorial from Algorithms to Systems
- Towards Efficient Spiking Transformer: a Token Sparsification Framework for Training and Inference Acceleration
- Towards Efficient Training and Evaluation of Robust Models against $l_0$ Bounded Adversarial Perturbations
- Towards General Algorithm Discovery for Combinatorial Optimization: Learning Symbolic Branching Policy from Bipartite Graph
- Towards Generalization beyond Pointwise Learning: A Unified Information-theoretic Perspective
- Towards General Neural Surrogate Solvers with Specialized Neural Accelerators
- Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles
- Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation
- Towards Modular LLMs by Building and Reusing a Library of LoRAs
- Towards Neural Architecture Search through Hierarchical Generative Modeling
- Towards Optimal Adversarial Robust Q-learning with Bellman Infinity-error
- Towards Realistic Model Selection for Semi-supervised Learning
- Towards Resource-friendly, Extensible and Stable Incomplete Multi-view Clustering
- Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption
- Towards Scalable and Versatile Weight Space Learning
- Towards Theoretical Understanding of Learning Large-scale Dependent Data via Random Features
- Towards Theoretical Understandings of Self-Consuming Generative Models
- Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms
- Towards Understanding Inductive Bias in Transformers: A View From Infinity
- Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
- Towards Unified Multi-granularity Text Detection with Interactive Attention
- Trainable Transformer in Transformer
- Trained Random Forests Completely Reveal your Dataset
- Training-Free Long-Context Scaling of Large Language Models
- Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization
- Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
- Transferable Facial Privacy Protection against Blind Face Restoration via Domain-Consistent Adversarial Obfuscation
- Transferring Knowledge From Large Foundation Models to Small Downstream Models
- Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
- Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models
- Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
- Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
- Transformers, parallel computation, and logarithmic depth
- Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot
- Transforming and Combining Rewards for Aligning Large Language Models
- Transitional Uncertainty with Layered Intermediate Predictions
- Translating Subgraphs to Nodes Makes Simple GNNs Strong and Efficient for Subgraph Representation Learning
- Translation Equivariant Transformer Neural Processes
- Transolver: A Fast Transformer Solver for PDEs on General Geometries
- Transport of Algebraic Structure to Latent Embeddings
- TravelPlanner: A Benchmark for Real-World Planning with Language Agents
- Triadic-OCD: Asynchronous Online Change Detection with Provable Robustness, Optimality, and Convergence
- Triple Changes Estimator for Targeted Policies
- Triplet Interaction Improves Graph Transformers: Accurate Molecular Graph Learning with Triplet Graph Transformers
- Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning
- TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks
- Truly No-Regret Learning in Constrained MDPs
- Trustless Audits without Revealing Data or Models
- Trust Regions for Explanations via Black-Box Probabilistic Certification
- Trust the Model Where It Trusts Itself - Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption
- Trustworthy Actionable Perturbations
- Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning
- Trustworthy Multi-modal Foundation Models and AI Agents (TiFA)
- TSLANet: Rethinking Transformers for Time Series Representation Learning
- Tuning-free Estimation and Inference of Cumulative Distribution Function under Local Differential Privacy
- Tuning-Free Stochastic Optimization
- Turnstile $\ell_p$ leverage score sampling with applications
- TVE: Learning Meta-attribution for Transferable Vision Explainer
- Two Fists, One Heart: Multi-Objective Optimization Based Strategy Fusion for Long-tailed Learning
- Two Heads are Actually Better than One: Towards Better Adversarial Robustness via Transduction and Rejection
- Two Heads Are Better Than One: Boosting Graph Sparse Training via Semantic and Topological Awareness
- Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences Constraints
- Two-Stage Shadow Inclusion Estimation: An IV Approach for Causal Inference under Latent Confounding and Collider Bias
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
- Two Tales of Single-Phase Contrastive Hebbian Learning
- Two-timescale Derivative Free Optimization for Performative Prediction with Markovian Data
- UGrid: An Efficient-And-Rigorous Neural Multigrid Solver for Linear PDEs
- ULAREF: A Unified Label Refinement Framework for Learning with Inaccurate Supervision
- ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback
- Unapologetically Open Science -- the complexity and challenges of making openness win!
- Unbiased Multi-Label Learning from Crowdsourced Annotations
- Uncertainty-Aware Reward-Free Exploration with General Function Approximation
- Uncertainty Estimation by Density Aware Evidential Deep Learning
- Uncertainty for Active Learning on Graphs
- Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise
- Understanding and Diagnosing Deep Reinforcement Learning
- Understanding Diffusion Models by Feynman's Path Integral
- Understanding Finetuning for Factual Knowledge Extraction
- Understanding Forgetting in Continual Learning with Linear Regression
- Understanding Heterophily for Graph Neural Networks
- Understanding Inter-Concept Relationships in Concept-Based Models
- Understanding MLP-Mixer as a wide and sparse MLP
- Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation
- Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models
- Understanding Server-Assisted Federated Learning in the Presence of Incomplete Client Participation
- Understanding Stochastic Natural Gradient Variational Inference
- Understanding the Effects of Iterative Prompting on Truthfulness
- Understanding the Impact of Introducing Constraints at Inference Time on Generalization Error
- Understanding the Learning Dynamics of Alignment with Human Feedback
- Understanding the Role of Large Language Models in Planning
- Understanding the Training Speedup from Sampling with Approximate Losses
- Understanding Unimodal Bias in Multimodal Deep Linear Networks
- UniAudio: Towards Universal Audio Generation with Large Language Models
- UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning
- Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding
- Unified Training of Universal Time Series Forecasting Transformers
- Uniformly Stable Algorithms for Adversarial Training and Beyond
- Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models
- Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations
- Unifying Image Processing as Visual Prompting Question Answering
- Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes
- Universal Gradient Methods for Stochastic Convex Optimization
- Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
- Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts
- Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training
- Unlock the Cognitive Generalization of Deep Reinforcement Learning via Granular Ball Representation
- Unmasking Vulnerabilities: Cardinality Sketches under Adaptive Inputs
- Unraveling the Impact of Heterophilic Structures on Graph Positive-Unlabeled Learning
- Unsupervised Concept Discovery Mitigates Spurious Correlations
- Unsupervised Domain Adaptation for Anatomical Structure Detection in Ultrasound Images
- Unsupervised Episode Generation for Graph Meta-learning
- Unsupervised Evaluation of Code LLMs with Round-Trip Correctness
- Unsupervised Parameter-free Simplicial Representation Learning with Scattering Transforms
- Unsupervised Representation Learning of Brain Activity via Bridging Voxel Activity and Functional Connectivity
- Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings
- Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration
- Unveiling Privacy, Memorization, and Input Curvature Links
- Unveiling the Cycloid Trajectory of EM Iterations in Mixed Linear Regression
- Unveiling the Dynamics of Information Interplay in Supervised Learning
- Unveiling the Potential of AI for Nanomaterial Morphology Prediction
- UP2ME: Univariate Pre-training to Multivariate Fine-tuning as a General-purpose Framework for Multivariate Time Series Analysis
- UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers
- UPOCR: Towards Unified Pixel-Level OCR Interface
- Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers
- Using AI Uncertainty Quantification to Improve Human Decision-Making
- Using Left and Right Brains Together: Towards Vision and Language Planning
- Using Uncertainty Quantification to Characterize and Improve Out-of-Domain Learning for PDEs
- USTAD: Unified Single-model Training Achieving Diverse Scores for Information Retrieval
- Vague Prototype-Oriented Diffusion Model for Multi-Class Anomaly Detection
- Value-Evolutionary-Based Reinforcement Learning
- Vanilla Bayesian Optimization Performs Great in High Dimensions
- Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
- Variational Inference with Coverage Guarantees in Simulation-Based Inference
- Variational Learning is Effective for Large Deep Networks
- Variational Linearized Laplace Approximation for Bayesian Deep Learning
- Variational Partial Group Convolutions for Input-Aware Partial Equivariance of Rotations and Color-Shifts
- Variational Schrödinger Diffusion Models
- Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
- Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations
- Vector Quantization Pretraining for EEG Time Series with Random Projection and Phase Alignment
- Verification of Machine Unlearning is Fragile
- Verifying message-passing neural networks via topology-based bounds tightening
- Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
- Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
- VideoPoet: A Large Language Model for Zero-Shot Video Generation
- VideoPrism: A Foundational Visual Encoder for Video Understanding
- video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
- Viewing Transformers Through the Lens of Long Convolutions Layers
- VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception
- ViP: A Differentially Private Foundation Model for Computer Vision
- VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
- Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
- Vision Transformers as Probabilistic Expansion from Learngene
- Visual Representation Learning with Stochastic Frame Prediction
- Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
- Visual Transformer with Differentiable Channel Selection: An Information Bottleneck Inspired Approach
- VNN: Verification-Friendly Neural Networks with Hard Robustness Guarantees
- Vocabulary for Universal Approximation: A Linguistic Perspective of Mapping Compositions
- VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model
- VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling
- WARM: On the Benefits of Weight Averaged Reward Models
- Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformer
- Watermarks in the Sand: Impossibility of Strong Watermarking for Language Models
- Watermark Stealing in Large Language Models
- WAVES: Benchmarking the Robustness of Image Watermarks
- Weakly Convex Regularisers for Inverse Problems: Convergence of Critical Points and Primal-Dual Optimisation
- Weakly-Supervised Residual Evidential Learning for Multi-Instance Uncertainty Estimation
- Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
- WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
- Weighted distance nearest neighbor condensing
- Weisfeiler-Leman at the margin: When more expressivity matters
- Weisfeiler Leman for Euclidean Equivariant Machine Learning
- What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
- What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding
- What is Dataset Distillation Learning?
- What is the Long-Run Distribution of Stochastic Gradient Descent? A Large Deviations Analysis
- What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
- "What robots have taught me about machine learning"
- What’s the score? Automated Denoising Score Matching for Nonlinear Diffusions
- What Will My Model Forget? Forecasting Forgotten Examples in Language Model Refinement
- What Would Gauss Say About Representations? Probing Pretrained Image Models using Synthetic Gaussian Benchmarks
- When and How Does In-Distribution Label Help Out-of-Distribution Detection?
- When Do Skills Help Reinforcement Learning? A Theoretical Analysis of Temporal Abstractions
- When is Transfer Learning Possible?
- When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
- When Representations Align: Universality in Representation Learning Dynamics
- When Will Gradient Regularization Be Harmful?
- Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning
- Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models
- Why Do Animals Need Shaping? A Theory of Task Composition and Curriculum Learning
- Why do Variational Autoencoders Really Promote Disentanglement?
- Why Do You Grok? A Theoretical Analysis on Grokking Modular Addition
- Why Larger Language Models Do In-context Learning Differently?
- Winner-takes-all learners are geometry-aware conditional density estimators
- WISER: Weak Supervision and Supervised Representation Learning to Improve Drug Response Prediction in Cancer
- WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
- Workshop on Mechanistic Interpretability
- Workshop on Theoretical Foundations of Foundation Models (TF2M)
- Wukong: Towards a Scaling Law for Large-Scale Recommendation
- X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation
- xT: Nested Tokenization for Larger Context in Large Images
- Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement
- Zero-Shot Reinforcement Learning via Function Encoders
- Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
- Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach
- Zeroth-Order Methods for Constrained Nonconvex Nonsmooth Stochastic Optimization