Downloads 2025
Number of events: 3390
- $\epsilon$-VAE: Denoising as Visual Decoding
- $\infty$-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation
- $K^2$VAE: A Koopman-Kalman Enhanced Variational AutoEncoder for Probabilistic Time Series Forecasting
- $Q$-Learners Can Provably Collude in the Iterated Prisoner's Dilemma
- $S^2$FGL: Spatial Spectral Federated Graph Learning
- $\texttt{I$^2$MoE}$: Interpretable Multimodal Interaction-aware Mixture-of-Experts
- 1st Workshop on Foundation Models for Structured Data (FMSD)
- 2nd AI for Math Workshop @ ICML 2025
- 2nd Generative AI for Biology Workshop
- 2nd Workshop on Models of Human Feedback for AI Alignment (MoFA)
- 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)
- 3D-LMVIC: Learning-based Multi-View Image Compression with 3D Gaussian Geometric Priors
- 3D Question Answering via only 2D Vision-Language Models
- 3rd Workshop on High-dimensional Learning Dynamics (HiLD)
- AAAR-1.0: Assessing AI’s Potential to Assist Research
- A Bayesian Model Selection Criterion for Selecting Pretraining Checkpoints
- Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large $p$
- ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via $\alpha$-$\beta$-Divergence
- ABNet: Adaptive explicit-Barrier Net for Safe and Scalable Robot Learning
- A Bregman Proximal Viewpoint on Neural Operators
- A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment
- Accelerated Diffusion Models via Speculative Sampling
- Accelerating Large Language Model Reasoning via Speculative Search
- Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity
- Accelerating LLM Inference with Lossless Speculative Decoding for Heterogeneous Vocabularies
- Accelerating PDECO by the Derivative of Neural Operators
- Accelerating Quantum Reinforcement Learning with a Quantum Natural Policy Gradient Based Approach
- Accelerating Spectral Clustering under Fairness Constraints
- Accurate and Efficient World Modeling with Masked Latent Transformers
- Accurate Identification of Communication Across Multiple Interacting Neural Populations
- A Certified Unlearning Approach without Access to Source Data
- A Chaotic Dynamics Framework Inspired by Dorsal Stream for Event Signal Processing
- Achieving Linear Speedup and Optimal Complexity for Decentralized Optimization over Row-stochastic Networks
- A Classification View on Meta Learning Bandits
- A Closer Look at Backdoor Attacks on CLIP
- A Closer Look at Multimodal Representation Collapse
- A Closer Look at the Generalized BH Algorithm
- A Closer Look at Transformers for Time Series Forecasting: Understanding Why They Work and Where They Struggle
- A Comprehensive Analysis on LLM-based Node Classification Algorithms
- A Comprehensive Framework for Analyzing the Convergence of Adam: Bridging the Gap with SGD
- A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear MDPs
- A Contextual Online Learning Theory of Brokerage
- A Cross Modal Knowledge Distillation & Data Augmentation Recipe for Improving Transcriptomics Representations through Morphological Features
- Actionable Interpretability
- Action-Constrained Imitation Learning
- Action-Dependent Optimality-Preserving Reward Shaping
- Action Dubber: Timing Audible Actions via Inflectional Flow
- Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional
- Activation by Interval-wise Dropout: A Simple Way to Prevent Neural Networks from Plasticity Loss
- Activation Space Interventions Can Be Transferred Between Large Language Models
- Active Evaluation Acquisition for Efficient LLM Benchmarking
- Active feature acquisition via explainability-driven ranking
- Active Fine-Tuning of Multi-Task Policies
- Active learning for efficient discovery of optimal combinatorial perturbations
- Active Learning of Deep Neural Networks via Gradient-Free Cutting Planes
- Active Learning with Selective Time-Step Acquisition for PDEs
- Active Reward Modeling: Adaptive Preference Labeling for Large Language Models
- Active Treatment Effect Estimation via Limited Samples
- Actor-Critics Provably Achieve Optimal Sample Efficiency With General Function Approximation
- AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism
- Adapter Naturally Serves as Decoupler for Cross-Domain Few-Shot Semantic Segmentation
- Adapting Precomputed Features for Efficient Graph Condensation
- Adapting to Evolving Adversaries with Regularized Continual Robust Training
- Adapting to Linear Separable Subsets with Large-Margin in Differentially Private Learning
- Adapting While Learning: Grounding LLMs for Scientific Problems with Tool Usage Adaptation
- Adaptive Data Collection for Robust Learning Across Multiple Distributions
- Adaptive Elicitation of Latent Information Using Natural Language
- Adaptive Estimation and Learning under Temporal Distribution Shift
- Adaptive Exploration for Multi-Reward Multi-Policy Evaluation
- Adaptive Flow Matching for Resolving Small-Scale Physics
- Adaptive kernel predictors from feature-learning infinite limits of neural networks
- Adaptive Learn-then-Test: Statistically Valid and Efficient Hyperparameter Selection
- Adaptive Localization of Knowledge Negation for Continual LLM Unlearning
- Adaptive Median Smoothing: Adversarial Defense for Unlearned Text-to-Image Diffusion Models at Inference Time
- Adaptive Message Passing: A General Framework to Mitigate Oversmoothing, Oversquashing, and Underreaching
- Adaptive Multi-prompt Contrastive Network for Few-shot Out-of-distribution Detection
- Adaptive Partitioning Schemes for Optimistic Optimization
- Adaptive Sample Sharing for Multi Agent Linear Bandits
- Adaptive Self-improvement LLM Agentic System for ML Library Development
- Adaptive Sensitivity Analysis for Robust Augmentation against Natural Corruptions in Image Segmentation
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence
- AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting
- AdaSplash: Adaptive Sparse Flash Attention
- AdaWorld: Learning Adaptable World Models with Latent Actions
- ADDQ: Adaptive distributional double Q-learning
- Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization
- Addressing Imbalanced Domain-Incremental Learning through Dual-Balance Collaborative Experts
- Addressing Misspecification in Simulation-based Inference through Data-driven Calibration
- ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization
- Ad-Hoc Human-AI Coordination Challenge
- Ad Hoc Teamwork via Offline Goal-Based Decision Transformers
- ADIOS: Antibody Development via Opponent Shaping
- Adjoint Sampling: Highly-Scalable Diffusion Samplers via Adjoint Matching
- Adjusting Model Size in Continual Gaussian Processes: How Big is Big Enough?
- Adjustment for Confounding using Pre-Trained Representations
- AdvAgent: Controllable Blackbox Red-teaming on Web Agents
- Advancing Constrained Monotonic Neural Networks: Achieving Universal Approximation Beyond Bounded Activations
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
- Advancing Personalized Learning with Neural Collapse for Long-Tail Challenge
- Advective Diffusion Transformers for Topological Generalization
- Adversarial Combinatorial Semi-bandits with Graph Feedback
- Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets
- Adversarial Inception Backdoor Attacks against Reinforcement Learning
- Adversarial Inputs for Linear Algebra Backends
- Adversarial Optimization of Multidimensional Adaptive Coefficients in Flow and Diffusion Models
- Adversarial Perturbations Are Linear Combinations of the Right Singular Vectors of the Attack-Targets-Ranking Constrained Jacobian
- Adversarial Reasoning at Jailbreaking Time
- Adversarial Robust Generalization of Graph Neural Networks
- Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees
- Adversarial Robustness via Deformable Convolution with Stochasticity
- Adversaries Can Misuse Combinations of Safe Models
- AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion Models
- AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
- A Dynamical Systems-Inspired Pruning Strategy for Addressing Oversmoothing in Graph Attention Networks
- AEQA-NAT : Adaptive End-to-end Quantization Alignment Training Framework for Non-autoregressive Machine Translation
- Aequa: Fair Model Rewards in Collaborative Learning via Slimmable Networks
- AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models
- AffinityFlow: Guided Flows for Antibody Affinity Maturation
- A First-order Generative Bilevel Optimization Framework for Diffusion Models
- A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control
- A Free Lunch for Length Extrapolation in Video Diffusion Transformers
- AGAV-Rater: Adapting Large Multimodal Model for AI-Generated Audio-Visual Quality Assessment
- A General Framework for Inference-time Scaling and Steering of Diffusion Models
- A General Graph Spectral Wavelet Convolution via Chebyshev Order Decomposition
- A Generalization Result for Convergence in Learning-to-Optimize
- A General-Purpose Physics-Enhanced State Space Model for Long-Term Dynamics Forecasting in Complex Environments
- A General Representation-Based Approach to Multi-Source Domain Adaptation
- A Generic Family of Graphical Models: Diversity, Efficiency, and Heterogeneity
- Agent-as-a-Judge: Evaluate Agents with Agents
- Agent-Centric Actor-Critic for Asynchronous Multi-Agent Reinforcement Learning
- Agent Reviewers: Domain-specific Multimodal Agents with Shared Memory for Paper Review
- Agent Workflow Memory
- A Geometric Approach to Personalized Recommendation with Set-Theoretic Constraints Using Box Embeddings
- Aggregation Buffer: Revisiting DropEdge with a New Parameter Block
- Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders
- Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
- A Hitchhiker's Guide to Scaling Law Estimation
- AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N
- AI Heard That! ICML 2025 Workshop on Machine Learning for Audio
- AKORN: Adaptive Knots generated Online for RegressioN Splines
- AKRMap: Adaptive Kernel Regression for Trustworthy Visualization of Cross-Modal Embeddings
- A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
- Alberta Wells Dataset: Pinpointing Oil and Gas Wells from Satellite Imagery
- A Lens into Interpretable Transformer Mistakes via Semantic Dependency
- Algorithm Development in Neural Networks: Insights from the Streaming Parity Task
- Algorithmic Recourse for Long-Term Improvement
- Algorithms and Hardness for Active Learning on Graphs
- Algorithms with Calibrated Predictions
- Aligned Multi Objective Optimization
- Aligning Atomic Vision Language Concepts for Controllable Image Generation
- Aligning LLMs by Predicting Preferences from User Writing Samples
- Aligning Multimodal Representations through an Information Bottleneck
- Aligning Protein Conformation Ensemble Generation with Physical Feedback
- Aligning Spoken Dialogue Models from User Interactions
- Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models
- Alignment Methods for Large Language Models
- A Likelihood Based Approach to Distribution Regression Using Conditional Deep Generative Models
- A linear query lower bound for submodular maximization
- All-atom Diffusion Transformers
- All-Atom Inverse Folding Through Discrete Flow Matching
- All-Purpose Mean Estimation over R: Optimal Sub-Gaussianity with Outlier Robustness and Low Moments Performance
- Almost Optimal Fully Dynamic $k$-Center Clustering with Recourse
- ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
- AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
- AlphaPO - Reward shape matters for LLM alignment
- AlphaQCM: Alpha Discovery in Finance with Distributional Reinforcement Learning
- Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search
- AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement
- A machine learning approach to duality in statistical physics
- A Manifold Perspective on the Statistical Generalization of Graph Neural Networks
- A Market for Accuracy: Classification Under Competition
- A Mathematical Framework for AI-Human Integration in Work
- am-ELO: A Stable Framework for Arena-based LLM Evaluation
- A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models
- A Meta-learner for Heterogeneous Effects in Difference-in-Differences
- A Mixed-Curvature based Pre-training Paradigm for Multi-Task Vehicle Routing Solver
- A Mixture-Based Framework for Guiding Diffusion Models
- A Model of Place Field Reorganization During Reward Maximization
- AMPO: Active Multi Preference Optimization
- A Multi-Region Brain Model to Elucidate the Role of Hippocampus in Spatially Embedded Decision-Making
- An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures
- An All-Atom Generative Model for Designing Protein Complexes
- AnalogGenie-Lite: Enhancing Scalability and Precision in Circuit Topology Discovery through Lightweight Graph Modeling
- Analytical Construction on Geometric Architectures: Transitioning from Static to Temporal Link Prediction
- Analytical Lyapunov Function Discovery: An RL-based Generative Approach
- Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
- An Analysis for Reasoning Bias of Language Models with Small Initialization
- An Analysis of Quantile Temporal-Difference Learning
- An analytic theory of creativity in convolutional diffusion models
- An Architecture Search Framework for Inference-Time Techniques
- An Asymptotically Optimal Approximation Algorithm for Multiobjective Submodular Maximization at Scale
- An Augmentation-Aware Theory for Self-Supervised Contrastive Learning
- An Automated Graph Foundation Model with Adaptive Graph Neural Architecture Customization
- A Near-Optimal Single-Loop Stochastic Algorithm for Convex Finite-Sum Coupled Compositional Optimization
- An Effective and Secure Federated Multi-View Clustering Method with Information-Theoretic Perspective
- An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks
- An Efficient Private GPT Never Autoregressively Decodes
- An Efficient Pruner for Large Language Model with Theoretical Guarantee
- An efficient search-and-score algorithm for ancestral graphs using multivariate information scores
- An Empirical Study on Configuring In-Context Learning Demonstrations for Unleashing MLLMs' Sentimental Perception Capability
- An End-to-End Model For Logits Based Large Language Models Watermarking
- An Entropy-Based Model for Hierarchical Learning
- An Error Analysis of Flow Matching for Deep Generative Modeling
- A New Approach to Backtracking Counterfactual Explanations: A Causal Framework for Efficient Model Interpretability
- A New Concentration Inequality for Sampling Without Replacement and Its Application for Transductive Learning
- An Expressive and Self-Adaptive Dynamical System for Efficient Equation Learning
- Angle Domain Guidance: Latent Diffusion Requires Rotation Rather Than Extrapolation
- Angle-Robust Networks based on Gamma Distribution PCA for SAR Target Recognition
- An Improved Clique-Picking Algorithm for Counting Markov Equivalent DAGs via Super Cliques Transfers
- An in depth look at the Procrustes-Wasserstein distance: properties and barycenters
- An Instrumental Value for Data Production and its Application to Data Pricing
- An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks
- Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions
- A Non-Asymptotic Convergent Analysis for Scored-Based Graph Generative Model via a System of Stochastic Differential Equations
- A Non-isotropic Time Series Diffusion Model with Moving Average Transitions
- An Online Adaptive Sampling Algorithm for Stochastic Difference-of-convex Optimization with Time-varying Distributions
- An Online Learning Approach to Prompt-based Selection of Generative Models
- An Optimistic Algorithm for online CMDPS with Anytime Adversarial Constraints
- A Novel Characterization of the Population Area Under the Risk Coverage Curve (AURC) and Rates of Finite Sample Estimators
- Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning
- any4: Learned 4-bit Numeric Representation for LLMs
- AnyEdit: Edit Any Knowledge Encoded in Language Models
- Anytime-Constrained Equilibria in Polynomial Time
- A Online Statistical Framework for Out-of-Distribution Detection
- A Parameter-Free and Near-Optimal Zeroth-Order Algorithm for Stochastic Convex Optimization
- A Peer-review Look on Multi-modal Clustering: An Information Bottleneck Realization Method
- A Physics-Augmented Deep Learning Framework for Classifying Single Molecule Force Spectroscopy Data
- A Physics-Informed Machine Learning Framework for Safe and Optimal Control of Autonomous Systems
- A Polynomial-Delay Maximal Ancestral Graph Listing Algorithm
- Approximate Differential Privacy of the $\ell_2$ Mechanism
- Approximate Forest Completion and Learning-Augmented Algorithms for Metric Minimum Spanning Trees
- Approximately Correct Label Distribution Learning
- Approximating Latent Manifolds in Neural Networks via Vanishing Ideals
- Approximation to Smooth Functions by Low-rank Swish Networks
- A-PSRO: A Unified Strategy Learning Method with Advantage Metric for Normal-form Games
- Arbitrarily-Conditioned Multi-Functional Diffusion for Multi-Physics Emulation
- Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
- A Reasoning-Based Approach to Cryptic Crossword Clue Solving
- A Recipe for Causal Graph Regression: Confounding Effects Revisited
- A Reductions Approach to Risk-Sensitive Reinforcement Learning with Optimized Certainty Equivalents
- Are Foundation Models Foundational? Synthetic Tasks Reveal World Models
- Are High-Quality AI-Generated Images More Difficult for Models to Detect?
- Are Large Brainwave Foundation Models Capable Yet ? Insights from Fine-Tuning
- Are Large Language Models Ready for Multi-Turn Tabular Data Analysis?
- Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle
- A rescaling-invariant Lipschitz bound based on path-metrics for modern ReLU network parameterizations
- Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
- Armijo Line-search Makes (Stochastic) Gradient Descent Go Fast
- Arrow: Accelerator for Time Series Causal Discovery with Time Weaving
- ARS: Adaptive Reward Scaling for Multi-Task Reinforcement Learning
- A Sample Efficient Conditional Independence Test in the Presence of Discretization
- A Selective Learning Method for Temporal Graph Continual Learning
- A Sharper Global Convergence Analysis for Average Reward Reinforcement Learning via an Actor-Critic Approach
- A shot of Cognac to forget bad memories: Corrective Unlearning in GNNs
- A Simple Model of Inference Scaling Laws
- A Square Peg in a Square Hole: Meta-Expert for Long-Tailed Semi-Supervised Learning
- Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models
- Assessing World Models: Methods and Metrics for Evaluating Understanding
- A Stronger Mixture of Low-rank Experts for Fine-Tuning Foundation Models
- A Sub-Problem Quantum Alternating Operator Ansatz for Correlation Clustering
- Asymmetric Decision-Making in Online Knowledge Distillation: Unifying Consensus and Divergence
- AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
- ATA: Adaptive Task Allocation for Efficient Resource Management in Distributed Machine Learning
- A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language?
- A Theoretical Framework For Overfitting In Energy-based Modeling
- A Theoretical Justification for Asymmetric Actor-Critic Algorithms
- A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization
- A Theory for Conditional Generative Modeling on Multiple Data Sources
- A Three-Branch Checks-and-Balances Framework for Context-Aware Ethical Alignment of Large Language Models
- AtlasD: Automatic Local Symmetry Discovery
- A Trichotomy for List Transductive Online Learning
- Attention-aware Post-training Quantization without Backpropagation
- Attention-Level Speculation
- Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data
- Attention-Only Transformers via Unrolled Subspace Denoising
- At the Edge of Laziness: Scaling Limits of Catastrophic Forgetting
- Attributes Shape the Embedding Space of Face Recognition Models
- A Two-Stage Learning-to-Defer Approach for Multi-Task Learning
- Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
- AudioSpace: Generating Spatial Audio from 360-Degree Video
- Auditing $f$-differential privacy in one run
- Auditing Prompt Caching in Language Model APIs
- A Unified Approach to Routing and Cascading for LLMs
- A Unified Comparative Study with Generalized Conformity Scores for Multi-Output Conformal Regression
- A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization
- A Unified Framework for Generalization Error Analysis of Learning with Arbitrary Discrete Weak Features
- A Unified Theoretical Analysis of Private and Robust Offline Alignment: from RLHF to DPO
- A Unified View on Learning Unnormalized Distributions via Noise-Contrastive Estimation
- AuPair: Golden Example Pairs for Code Repair
- AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses
- AutoAL: Automated Active Learning with Differentiable Query Strategy Search
- AutoCATE: End-to-End, Automated Treatment Effect Estimation
- AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation
- AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling
- Autoencoder-Based Hybrid Replay for Class-Incremental Learning
- AutoEval Done Right: Using Synthetic Data for Model Evaluation
- Autoformulation of Mathematical Optimization Models Using LLMs
- Automated Benchmark Generation for Repository-Level Coding Tasks
- Automated Hypothesis Validation with Agentic Sequential Falsifications
- Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
- Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios
- Automatically Interpreting Millions of Features in Large Language Models
- Automatic Differentiation of Optimization Algorithms with Time-Varying Updates
- Automatic Reward Shaping from Confounded Offline Data
- Automating Benchmark Curation from Crowdsourced Data
- AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML
- Autonomy-of-Experts Models
- Auto-reconfiguration for Latency Minimization in CPU-based DNN Serving
- Autoregressive Optimal Design for Language Models
- AutoStep: Locally adaptive involutive MCMC
- A Variational Framework for Improving Naturalness in Generative Spoken Language Models
- A Variational Information Theoretic Approach to OOD Detection
- A Variational Perspective on Generative Protein Fitness Optimization
- Average Certified Radius is a Poor Metric for Randomized Smoothing
- Average Sensitivity of Hierarchical $k$-Median Clustering
- A Versatile Influence Function for Data Attribution with Non-Decomposable Loss
- Avoiding Catastrophe in Online Learning by Asking for Help
- Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts
- Avoiding spurious sharpness minimization broadens applicability of SAM
- AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
- Backdoor Attacks in Token Selection of Attention Mechanism
- BalancEdit: Dynamically Balancing the Generality-Locality Trade-off in Multi-modal Model Editing
- Balanced Learning for Domain Adaptive Semantic Segmentation
- Balancing Efficiency and Expressiveness: Subgraph GNNs with Walk-Based Centrality
- Balancing Interference and Correlation in Spatial Experimental Designs: A Causal Graph Cut Approach
- Balancing Model Efficiency and Performance: Adaptive Pruner for Long-tailed Data
- Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing
- Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data
- BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training
- BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms
- BAnG: Bidirectional Anchored Generation for Conditional RNA Design
- Banyan: Improved Representation Learning with Explicit Structure
- BARK: A Fully Bayesian Tree Kernel for Black-box Optimization
- BARNN: A Bayesian Autoregressive and Recurrent Neural Network
- Batch List-Decodable Linear Regression via Higher Moments
- BaWA: Automatic Optimizing Pruning Metric for Large Language Models with Balanced Weight and Activation
- BaxBench: Can LLMs Generate Correct and Secure Backends?
- Bayesian Active Learning for Bivariate Causal Discovery
- Bayesian basis function approximation for scalable Gaussian process priors in deep generative models
- Bayesian Consensus Prediction for Correlated Human Experts and Classifiers
- Bayesian Neural Scaling Laws Extrapolation with Prior-Fitted Networks
- Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds
- Bayesian Weight Enhancement with Stead-State Adaptation for Test-time Adaptation in Dynamic Environments
- BCE vs. CE in Deep Feature Learning
- BDC-CLIP: Brownian Distance Covariance for Adapting CLIP to Action Recognition
- Be a Goldfish: Forgetting Bad Conditioning in Sparse Linear Regression via Variational Autoencoders
- BECAME: Bayesian Continual Learning with Adaptive Model Merging
- Be Confident: Uncovering Overfitting in MLLM Multi-Task Tuning
- Behavior-agnostic Task Inference for Robust Offline In-context Reinforcement Learning
- Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning
- Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation
- Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
- Benchmarking Quantum Reinforcement Learning
- Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression
- Benign Overfitting in Token Selection of Attention Mechanism
- Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety
- Best of Both Worlds: Advantages of Hybrid Graph Sequence Models
- Best of Both Worlds: Regret Minimization versus Minimax Play
- BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute
- Best Subset Selection: Optimal Pursuit for Feature Selection and Elimination
- Better-than-2 Approximation for Constrained Correlation Clustering
- Better to Teach than to Give: Domain Generalized Semantic Segmentation via Agent Queries with Diffusion Model Guidance
- Beyond Atoms: Enhancing Molecular Pretrained Representations with 3D Space Modeling
- Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
- Beyond Communication Overhead: A Multilevel Monte Carlo Approach for Mitigating Compression Bias in Distributed Learning
- Beyond Confidence: Exploiting Homogeneous Pattern for Semi-Supervised Semantic Segmentation
- Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts
- Beyond CVaR: Leveraging Static Spectral Risk Measures for Enhanced Decision-Making in Distributional Reinforcement Learning
- Beyond Entropy: Region Confidence Proxy for Wild Test-Time Adaptation
- Beyond Limited Data: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving
- Beyond Log-Concavity and Score Regularity: Improved Convergence Bounds for Score-Based Generative Models in W2-distance
- Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning
- Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
- Beyond Minimax Rates in Group Distributionally Robust Optimization via a Novel Notion of Sparsity
- Beyond One-Hot Labels: Semantic Mixing for Model Calibration
- Beyond Pointwise Intervention: Learning Distribution-wise Control in Representation Space for Langauge Models
- Beyond Self-Interest: How Group Strategies Reshape Content Creation in Recommendation Systems?
- Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs
- Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions
- Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
- Beyond the Lazy versus Rich Dichotomy: Geometry Insights in Feature Learning from Task-Relevant Manifold Untangling
- Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
- Beyond The Rainbow: High Performance Deep Reinforcement Learning on a Desktop PC
- Beyond Topological Self-Explainable GNNs: A Formal Explainability Perspective
- Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics
- BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly
- Bifurcate then Alienate: Incomplete Multi-view Clustering via Coupled Distribution Learning with Linear Overhead
- BILBO: BILevel Bayesian Optimization
- BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution
- BiMark: Unbiased Multilayer Watermarking for Large Language Models
- Binary Hypothesis Testing for Softmax Models and Leverage Score Models
- BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models
- Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation
- Bi-perspective Splitting Defense: Achieving Clean-Data-Free Backdoor Security
- Bivariate Causal Discovery with Proxy Variables: Integral Solving and Beyond
- Black-Box Adversarial Attacks on LLM-Based Code Completion
- Blink of an eye: a simple theory for feature localization in generative models
- BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference
- Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
- BOOD: Boundary-based Out-Of-Distribution Data Generation
- Boost-and-Skip: A Simple Guidance-Free Diffusion for Minority Generation
- Boosting Adversarial Robustness with CLAT: Criticality Leveraged Adversarial Training
- Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners
- Boosting Mesh Generation with Coordinates Merging
- Boosting Multi-Domain Fine-Tuning of Large Language Models through Evolving Interactions between Samples
- Boosting Performance on ARC is a Matter of Perspective
- Boosting Protein Graph Representations through Static-Dynamic Fusion
- Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark
- BoxLM: Unifying Structures and Semantics of Medical Concepts for Diagnosis Prediction in Healthcare
- Branches: Efficiently Seeking Optimal Sparse Decision Trees via AO*
- Breaking Barriers: Combinatorial Algorithms for Non-monotone Submodular Maximization with Sublinear Adaptivity and $1/e$ Approximation
- Breaking Barriers in Hard Samples: Guiding the Generation of Synthetic Data for Medical Tasks with Data-centric Approach
- Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting
- Breaking the $n^{1.5}$ Additive Error Barrier for Private and Efficient Graph Sparsification via Private Expander Decomposition
- Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning
- Breaking the Quadratic Barrier: Robust Cardinality Sketches for Adaptive Queries
- BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimisation and Diffusion Modelling
- Bridging Fairness and Efficiency in Conformal Inference: A Surrogate-Assisted Group-Clustered Approach
- Bridging Layout and RTL: Knowledge Distillation based Timing Prediction
- Bridging Protein Sequences and Microscopy Images with Unified Diffusion Models
- Bridging the Gap: Competitive Neural Data Similarity with Biologically Plausible Temporal Credit Assignment Rules
- Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging
- BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
- Broadband Ground Motion Synthesis by Diffusion Model with Minimal Condition
- B-score: Detecting biases in large language models using response history
- BSemiFL: Semi-supervised Federated Learning via a Bayesian Approach
- BSLoRA: Enhancing the Parameter Efficiency of LoRA with Intra-Layer and Inter-Layer Sharing
- BSO: Binary Spiking Online Optimization
- Building Physically Plausible World Models
- Byzantine-Resilient Federated Alternating Gradient Descent and Minimization for Partly-Decoupled Low Rank Matrix Learning
- C2IQL: Constraint-Conditioned Implicit Q-learning for Safe Offline Reinforcement Learning
- C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation
- Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
- CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging
- Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models
- CACTI: Leveraging Copy Masking and Contextual Information to Improve Tabular Data Imputation
- CaDA: Cross-Problem Routing Solver with Constraint-Aware Dual-Attention
- CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing
- Calibrated Language Models and How to Find Them with Label Smoothing
- Calibrated Physics-Informed Uncertainty Quantification.
- Calibrated Value-Aware Model Learning with Probabilistic Environment Models
- Calibrating Video Watch-time Predictions with Credible Prototype Alignment
- Calibration and Bias in Algorithms, Data, and Models: a tutorial on metrics and plots for measuring calibration, bias, fairness, reliability, and robustness
- CALM: Consensus-Aware Localized Merging for Multi-Task Learning
- Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
- Can DBNNs Robust to Environmental Noise for Resource-constrained Scenarios?
- Can Diffusion Models Learn Hidden Inter-Feature Rules Behind Images?
- Can Large Language Models Ask the Right Questions in Solving Complex Problems with Incomplete Information?
- Can Large Language Models Understand Intermediate Representations?
- CAN: Leveraging Clients As Navigators for Generative Replay in Federated Continual Learning
- Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
- Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
- Canonical Rank Adaptation: An Efficient Fine-Tuning Strategy for Vision Transformers
- Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
- Can Transformers Learn Full Bayesian Inference In Context?
- Can Transformers Reason Logically? A Study in SAT Solving
- Can We Predict Performance of Large Models across Vision-Language Tasks?
- Cape: Context-Aware Prompt Perturbation Mechanism with Differential Privacy
- Capturing Temporal Dynamics in Large-Scale Tree Canopy Height Estimation
- CASE-Bench: Context-Aware Safety Evaluation Benchmark for Large Language Models
- Catching Two Birds with One Stone: Reward Shaping with Dual Random Networks for Balancing Exploration and Exploitation
- Catch Your Emotion: Sharpening Emotion Perception in Multimodal Large Language Models
- CAT: Contrastive Adversarial Training for Evaluating the Robustness of Protective Perturbations in Latent Diffusion Models
- Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics
- Categorical Schrödinger Bridge Matching
- CateKV: On Sequential Consistency for Long-Context LLM Inference Acceleration
- CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging
- Catoni Contextual Bandits are Robust to Heavy-tailed Rewards
- Causal Abstraction Inference under Lossy Representations
- Causal Abstraction Learning based on the Semantic Embedding Principle
- Causal Attribution Analysis for Continuous Outcomes
- Causal Discovery from Conditionally Stationary Time Series
- Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants
- Causal Invariance-aware Augmentation for Brain Graph Contrastive Learning
- Causality-Aware Contrastive Learning for Robust Multivariate Time-Series Anomaly Detection
- Causality Inspired Federated Learning for OOD Generalization
- Causal Logistic Bandits with Counterfactual Fairness Constraint
- Causal-PIK: Causality-based Physical Reasoning with a Physics-Informed Kernel
- Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention
- CEGA: A Cost-Effective Approach for Graph-Based Model Extraction Attacks
- CellFlow: Simulating Cellular Morphology Changes via Flow Matching
- Censor Dependent Variational Inference
- CERTAIN: Context Uncertainty-aware One-Shot Adaptation for Context-based Offline Meta Reinforcement Learning
- Certification for Differentially Private Prediction in Gradient-Based Training
- Certified Unlearning for Neural Networks
- CFP-GEN: Combinatorial Functional Protein Generation via Diffusion Language Models
- CFPT: Empowering Time Series Forecasting through Cross-Frequency Interaction and Periodic-Aware Timestamp Modeling
- Chameleon: A Flexible Data-mixing Framework for Language Model Pretraining and Finetuning
- Channel Normalization for Time Series Channel Identification
- Chaos Meets Attention: Transformers for Large-Scale Dynamical Prediction
- CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation
- Chip Placement with Diffusion Models
- Circumventing Backdoor Space via Weight Symmetry
- CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries
- Clients Collaborate: Flexible Differentially Private Federated Learning with Guaranteed Improvement of Utility-Privacy Trade-off
- Clipped SGD Algorithms for Performative Prediction: Tight Bounds for Stochastic Bias and Remedies
- Clipping Improves Adam-Norm and AdaGrad-Norm when the Noise Is Heavy-Tailed
- Clone-Robust AI Alignment
- Closed-form Solutions: A New Perspective on Solving Differential Equations
- Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling
- CLOVER: Cross-Layer Orthogonal Vectors Pruning and Fine-Tuning
- Clustering Items through Bandit Feedback: Finding the Right Feature out of Many
- Clustering Properties of Self-Supervised Learning
- Clustering via Self-Supervised Diffusion
- CMoS: Rethinking Time Series Prediction Through the Lens of Chunk-wise Spatial Correlations
- CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization
- Code-Generated Graph Representations Using Multiple LLM Agents for Material Properties Prediction
- CodeIO: Condensing Reasoning Patterns via Code Input-Output Prediction
- CODEML: Championing Open-source DEvelopment in Machine Learning
- CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
- CodeSync: Synchronizing Large Language Models with Dynamic Code Evolution at Scale
- CoDy: Counterfactual Explainers for Dynamic Graphs
- COExpander: Adaptive Solution Expansion for Combinatorial Optimization
- CogMath: Assessing LLMs' Authentic Mathematical Ability from a Human Cognitive Perspective
- COGNATE: Learning-Based Acceleration of Sparse Tensor Programs on Emerging Hardware
- CogReact: A Reinforced Framework to Model Human Cognitive Reaction Modulated by Dynamic Intervention
- COKE: Core Kernel for More Efficient Approximation of Kernel Weights in Multiple Kernel Clustering
- CollabLLM: From Passive Responders to Active Collaborators
- Collaborative Mean Estimation Among Heterogeneous Strategic Agents: Individual Rationality, Fairness, and Truthful Contribution
- Collapse or Thrive: Perils and Promises of Synthetic Data in a Self-Generating World
- Collapse-Proof Non-Contrastive Self-Supervised Learning
- CombiMOTS: Combinatorial Multi-Objective Tree Search For Dual-Target Molecule Generation
- Combinatorial Reinforcement Learning with Preference Feedback
- CoMemo: LVLMs need image context with image memory
- Come Together, But Not Right Now: A Simple Strategy to Boost Low-Rank Adaptation
- Communicating Activations Between Language Model Agents
- Commute Graph Neural Networks
- CommVQ: Commutative Vector Quantization for KV Cache Compression
- Compact Matrix Quantum Group Equivariant Neural Networks
- Comparing Comparisons: Informative and Easy Human Feedback with Distinguishability Queries
- Comparing Few to Rank Many: Active Human Preference Learning using Randomized Frank-Wolfe method
- Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training
- Competing Bandits in Matching Markets via Super Stability
- Competitively Consistent Clustering
- Complete-Tree Space Favors Data-Efficient Link Prediction
- Complex Wavelet Mutual Information Loss: A Multi-Scale Loss Function for Semantic Segmentation
- Componential Prompt-Knowledge Alignment for Domain Incremental Learning
- Compositional Causal Reasoning Evaluation in Language Models
- Compositional Condition Question Answering in Tabular Understanding
- Compositional Flows for 3D Molecule and Synthesis Pathway Co-design
- Compositional Generalization Requires More Than Disentangled Representations
- Compositional Generative Multiphysics and Multi-component Simulation
- Compositional Risk Minimization
- Compositional Scene Understanding through Inverse Generative Modeling
- Compressed and distributed least-squares regression: convergence rates with applications to federated learning
- Compressed Image Generation with Denoising Diffusion Codebook Models
- Compressing tree ensembles through Level-wise Optimization and Pruning
- Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data
- Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead
- Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders
- Compute or Load KV Cache? Why Not Both?
- Computing Optimal Transport Maps and Wasserstein Barycenters Using Conditional Normalizing Flows
- Computing Voting Rules with Improvement Feedback
- COMRECGC: Global Graph Counterfactual Explainer through Common Recourse
- Concentration Distribution Learning from Label Distributions
- ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features
- Concept-Based Unsupervised Domain Adaptation
- Concept-Centric Token Interpretation for Vector-Quantized Generative Models
- Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks' Internal Representations
- Concept Reachability in Diffusion Models: Beyond Dataset Constraints
- Concurrent Learning with Aggregated States via Randomized Least Squares Value Iteration
- Conditional Diffusion Model with Nonlinear Data Transformation for Time Series Forecasting
- Conditioning Diffusions Using Malliavin Calculus
- Confidence Difference Reflects Various Supervised Signals in Confidence-Difference Classification
- Confidential Guardian: Cryptographically Prohibiting the Abuse of Model Abstention
- Conformal Anomaly Detection in Event Sequences
- Conformal Prediction as Bayesian Quadrature
- Conformal Prediction with Cellwise Outliers: A Detect-then-Impute Approach
- Conformal Tail Risk Control for Large Language Model Alignment
- Conformity Score Averaging for Classification
- Confounder-Free Continual Learning via Recursive Feature Normalization
- ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization
- Connecting Thomson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret
- Consensus Based Stochastic Optimal Control
- Consensus Is All You Get: The Role of Attention in Transformers
- Conservative Offline Goal-Conditioned Implicit V-Learning
- Constant Stepsize Local GD for Logistic Regression: Acceleration by Instability
- Constrain Alignment with Sparse Autoencoders
- Constrained belief updating explains geometric structures in transformer representations
- Constrained Exploitability Descent: An Offline Reinforcement Learning Method for Finding Mixed-Strategy Nash Equilibrium
- Constrained Online Convex Optimization with Polyak Feasibility Steps
- Constrained Pareto Set Identification with Bandit Feedback
- ConText: Driving In-context Learning for Text Removal and Segmentation
- Context-Informed Neural ODEs Unexpectedly Identify Broken Symmetries: Insights from the Poincaré-Hopf Theorem
- Context is Key: A Benchmark for Forecasting with Essential Textual Information
- Context Matters: Query-aware Dynamic Long Sequence Modeling of Gigapixel Images
- Contextual Bandits for Unbounded Context Distributions
- Contextual Linear Bandits with Delay as Payoff
- Contextually Tokenizing Action Sequences for Generative Recommendation
- Contextual Online Decision Making with Infinite-Dimensional Functional Regression
- Contextual Optimization Under Model Misspecification: A Tractable and Generalizable Approach
- Contextures: Representations from Contexts
- Continual Generalized Category Discovery: Learning and Forgetting from a Bayesian Perspective
- Continual Reinforcement Learning by Planning with Online World Models
- Continuous Bayesian Model Selection for Multivariate Causal Discovery
- Continuously Updating Digital Twins using Large Language Models
- Continuous Semi-Implicit Models: A Path Towards Consistency
- Continuous-Time Analysis of Heavy Ball Momentum in Min-Max Games
- Continuous Visual Autoregressive Generation via Score Maximization
- Contour Integration Underlies Human-Like Vision
- Contract Design Under Approximate Best Responses
- Contradiction Retrieval via Contrastive Learning with Sparsity
- Contrastive Learning with Simplicial Convolutional Networks for Short-Text Classification
- Contrastive Localized Language-Image Pre-Training
- Contrastive Private Data Synthesis via Weighted Multi-PLM Fusion
- Contrastive Visual Data Augmentation
- Control and Realism: Best of Both Worlds in Layout-to-Image without Training
- Controllable Data Generation with Hierarchical Neural Representations
- Controlled Generation with Equivariant Variational Flow Matching
- Controlling Large Language Model with Latent Action
- Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning
- Controlling Underestimation Bias in Constrained Reinforcement Learning for Safe Exploration
- Convergence Analysis of Policy Gradient Methods with Dynamic Stochasticity
- Convergence of Consistency Model with Multistep Sampling under General Data Assumptions
- Convergence of Mean-Field Langevin Stochastic Descent-Ascent for Distributional Minimax Optimization
- Convergence of Policy Mirror Descent Beyond Compatible Function Approximation
- Convergence Rates in Stochastic Stackelberg Games with Smooth Algorithmic Agents
- Convex Markov Games: A New Frontier for Multi-Agent Reinforcement Learning
- Cooperation of Experts: Fusing Heterogeneous Information with Large Margin
- CoPINN: Physical Informed Neural Network
- Core Context Aware Transformers for Long Context Language Modeling
- Core Knowledge Deficits in Multi-Modal Language Models
- CoreMatching: Co-adaptive Sparse Inference Framework for Comprehensive Acceleration of Vision Language Model
- Correlated Errors in Large Language Models
- Correlation Clustering Beyond the Pivot Algorithm
- CORTEXA: Enhancing LLM Agents for Software Engineering Tasks via Improved Localization and Solution Diversity
- COSDA: Counterfactual-based Susceptibility Risk Framework for Open-Set Domain Adaptation
- CoSER: Coordinating LLM-Based Persona Simulation of Established Roles
- Cost-efficient Collaboration between On-device and Cloud Language Models
- CostFilter-AD: Enhancing Anomaly Detection through Matching Cost Filtering
- Counterfactual Contrastive Learning with Normalizing Flows for Robust Treatment Effect Estimation
- Counterfactual Effect Decomposition in Multi-Agent Sequential Decision Making
- Counterfactual Graphical Models: Constraints and Inference
- Counting atoms faster: policy-based nuclear magnetic resonance pulse sequencing for atomic abundance measurement
- Counting in small transformers: The delicate interplay between attention and feed-forward layers
- Covered Forest: Fine-grained generalization analysis of graph neural networks
- Cover learning for large-scale topology representation
- Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems
- CPCF: A Cross-Prompt Contrastive Framework for Referring Multimodal Large Language Models
- Cradle: Empowering Foundation Agents towards General Computer Control
- Craftium: Bridging Flexibility and Efficiency for Rich 3D Single- and Multi-Agent Environments
- CRANE: Expressive Grammar-Constrained LLM Generation
- Critical Iterative Denoising: A Discrete Generative Model Applied to Graphs
- Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability
- Cross-City Latent Space Alignment for Consistency Region Embedding
- Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination
- Cross-Modal Alignment via Variational Copula Modelling
- Cross-regularization: Adaptive Model Complexity through Validation Gradients
- CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
- CSG-ODE: ControlSynth Graph ODE For Modeling Complex Evolution of Dynamic Graphs
- CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features
- CSV-Occ: Fusing Multi-frame Alignment for Occupancy Prediction with Temporal Cross State Space Model and Central Voting Mechanism
- CTBench: A Library and Benchmark for Certified Training
- CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning
- CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty
- Curse of High Dimensionality Issue in Transformer for Long Context Modeling
- CursorCore: Assist Programming through Aligning Anything
- Curvature-aware Graph Attention for PDEs on Manifolds
- Curvature Enhanced Data Augmentation for Regression
- CurvGAD: Leveraging Curvature for Enhanced Graph Anomaly Detection
- Customizing the Inductive Biases of Softmax Attention using Structured Matrices
- Cut out and Replay: A Simple yet Versatile Strategy for Multi-Label Online Continual Learning
- CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities
- DA-KD: Difficulty-Aware Knowledge Distillation for Efficient Large Language Models
- DAMO: Data- and Model-aware Alignment of Multi-modal LLMs
- DANCE: Dual Unbiased Expansion with Group-acquired Alignment for Out-of-distribution Graph Fairness Learning
- Data Differences over Scale (DataDos) Suite: How to Predict Best Pretraining Data with Small Experiments
- Data-driven Design of Randomized Control Trials with Guaranteed Treatment Effects
- Data-Driven Selection of Instrumental Variables for Additive Nonlinear, Constant Effects Models
- Dataflow-Guided Neuro-Symbolic Language Models for Type Inference
- Data Foundations for Large Scale Multimodal Clinical Foundation Models
- Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development
- Data Mixing Optimization for Supervised Fine-Tuning of Large Language Models
- DataWorld: Unifying data curation frameworks across domains
- David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training
- DCBM: Data-Efficient Visual Concept Bottleneck Models
- DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space
- DDAD: A Two-Pronged Adversarial Defense Based on Distributional Discrepancy
- DEALing with image reconstruction: Deep Attentive Least squares
- De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks
- Decision-aware training of spatiotemporal forecasting models to select a top K subset of sites for intervention
- Decision Making under the Exponential Family: Distributionally Robust Optimisation with Bayesian Ambiguity Sets
- Decision Mixer: Integrating Long-term and Local Dependencies via Dynamic Token Selection for Decision-Making
- Decision Theoretic Foundations for Conformal Prediction: Optimal Uncertainty Quantification for Risk-Averse Agents
- Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization
- Decomposition of Graphic Design with Large Multimodal Model
- De-coupled NeuroGF for Shortest Path Distance Approximations on Large Terrain Graphs
- Decoupled SGDA for Games with Intermittent Strategy Communication
- Deep Bayesian Filter for Bayes-Faithful Data Assimilation
- DeepCrossAttention: Supercharging Transformer Residual Connections
- Deep Electromagnetic Structure Design Under Limited Evaluation Budgets
- Deep Fuzzy Multi-view Learning for Reliable Classification
- DeepLayout: Learning Neural Representations of Circuit Placement Layout
- Deep Linear Network Training Dynamics from Random Initialization: Data, Width, Depth, and Hyperparameter Transfer
- Deep Neural Cellular Potts Models
- Deep principal support vector machines for nonlinear sufficient dimension reduction
- Deep Reinforcement Learning from Hierarchical Preference Design
- Deep Streaming View Clustering
- Deep Sturm–Liouville: From Sample-Based to 1D Regularization with Learnable Orthogonal Basis Functions
- Deep Unsupervised Hashing via External Guidance
- DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
- Defending LVLMs Against Vision Attacks through Partial-Perception Supervision
- DeFoG: Discrete Flow Matching for Graph Generation
- Delay-DSGN: A Dynamic Spiking Graph Neural Network with Delay Mechanisms for Evolving Graph
- Deliberation in Latent Space via Differentiable Cache Augmentation
- Delta Decompression for MoE-based LLMs Compression
- De-mark: Watermark Removal in Large Language Models
- Demeaned Sparse: Efficient Anomaly Detection by Residual Estimate
- Demonstration Selection for In-Context Learning via Reinforcement Learning
- Demystifying Catastrophic Forgetting in Two-Stage Incremental Object Detector
- Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs
- Demystifying Long Chain-of-Thought Reasoning
- Demystifying Singular Defects in Large Language Models
- Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
- Dendritic Localized Learning: Toward Biologically Plausible Algorithm
- Density Ratio Estimation-based Bayesian Optimization with Semi-Supervised Learning
- Density Ratio Estimation with Conditional Probability Paths
- Depth Degeneracy in Neural Networks: Vanishing Angles in Fully Connected ReLU Networks on Initialization
- Dequantified Diffusion Schrödinger Bridge for Density Ratio Estimation
- Design Considerations in Offline Preference-based RL
- Designing Cyclic Peptides via Harmonic SDE with Atom-Bond Modeling
- Detecting Strategic Deception with Linear Probes
- Determinant Estimation under Memory Constraints and Neural Scaling Laws
- Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective
- Deterministic Sparse Fourier Transform for Continuous Signals with Frequency Gap
- Devil is in the Details: Density Guidance for Detail-Aware Generation with Flow Models
- DexSim: Automating Data Scaling for Sim2Real Generalizable Robot Control
- D-Fusion: Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples
- Diagonal Symmetrization of Neural Network Solvers for the Many-Electron Schrödinger Equation
- Dialogue Without Limits: Constant-Sized KV Caches for Extended Response in LLMs
- DiffAdvMAP: Flexible Diffusion-Based Framework for Generating Natural Unrestricted Adversarial Examples
- Differentiable Solver Search for Fast Diffusion Sampling
- Differentiable Structure Learning with Ancestral Constraints
- Differential Coding for Training-Free ANN-to-SNN Conversion
- Differentially Private Analysis for Binary Response Models: Optimality, Estimation, and Inference
- Differentially Private Boxplots
- Differentially Private Federated $k$-Means Clustering with Server-Side Data
- Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model
- Differentially Private Synthetic Image Generation with Few-Shot Data and Generative APIs
- Differential privacy guarantees of Markov chain Monte Carlo algorithms
- Differential Privacy Under Class Imbalance: Methods and Empirical Insights
- Diff-MoE: Diffusion Transformer with Time-Aware and Space-Adaptive Experts
- DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra
- Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces
- Diffusion Adversarial Post-Training for One-Step Video Generation
- Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain
- Diffusion Counterfactual Generation with Semantic Abduction
- Diffusion Instruction Tuning
- Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Auto Speculation
- Diffusion models for Gaussian distributions: Exact solutions and Wasserstein errors
- Diffusion-Nested Non-Autoregressive Transformer for the Synthesis and Imputation of Heterogeneous Tabular Data
- Diffusion on language model encodings for protein sequence generation
- Diffusion Sampling Correction via Approximately 10 Parameters
- Diffusion-VLA: Generalizable and Interpretable Robot Foundation Model via Self-Generated Reasoning
- DIG-BUGS: Data in Generative Models (The Bad, the Ugly, and the Greats)
- DiMa: Understanding the Hardness of Online Matching Problems via Diffusion Models
- DIME: Diffusion-Based Maximum Entropy Reinforcement Learning
- Dimensionality Reduction on Complex Vector Spaces for Euclidean Distance with Dynamic Weights
- Dimension-Free Adaptive Subgradient Methods with Frequent Directions
- Dimension-Independent Rates for Structured Neural Density Estimation
- DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning
- DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy
- Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
- Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator
- Directed Graph Grammars for Sequence-based Learning
- Directly Forecasting Belief for Reinforcement Learning with Delays
- Direct Motion Models for Assessing Generated Videos
- Direct Prediction Set Minimization via Bilevel Conformal Classifier Training
- DIS-CO: Discovering Copyrighted Content in VLMs Training Data
- DISCO: learning to DISCover an evolution Operator for multi-physics-agnostic prediction
- Discovering a Zero (Zero-Vector Class of Machine Learning)
- Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning
- Discovering Latent Structural Causal Models from Spatiotemporal Data
- Discovering Physics Laws of Dynamical Systems via Invariant Function Learning
- Discovering Spoofing Attempts on Language Model Watermarks
- Discovering Symbolic Cognitive Models from Human and Animal Behavior
- Discrepancies are Virtue: Weak-to-Strong Generalization through an Intrinsic Dimension Lens
- Discrepancy Minimization in Input-Sparsity Time
- Discrete and Continuous Difference of Submodular Minimization
- Discrete Markov Probabilistic Models
- Discrete Neural Algorithmic Reasoning
- Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data
- Discriminative Policy Optimization for Token-Level Reward Models
- Disentangled Graph Spectral Domain Adaptation
- Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
- Disentangling Invariant Subgraph via Variance Contrastive Estimation under Distribution Shifts
- Disentangling Sequence Memorization and General Capability in LLMs
- Disparate Conditional Prediction in Multiclass Classifiers
- Dissecting Submission Limit in Desk-Rejections: A Mathematical Analysis of Fairness in AI Conference Policies
- Diss-l-ECT: Dissecting Graph Data with local Euler Characteristic Transforms
- Distillation of Discrete Diffusion through Dimensional Correlations
- Distillation Scaling Laws
- Distilling the Knowledge in Data Pruning
- DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
- Distinguishing Cause from Effect with Causal Velocity Models
- Distributed Conformal Prediction via Message Passing
- Distributed Differentially Private Data Analytics via Secure Sketching
- Distributed Event-Based Learning via ADMM
- Distributed Nonparametric Estimation: from Sparse to Dense Samples per Terminal
- Distributed Parallel Gradient Stacking(DPGS): Solving Whole Slide Image Stacking Challenge in Multi-Instance Learning
- Distributed Retraction-Free and Communication-Efficient Optimization on the Stiefel Manifold
- Distributional Diffusion Models with Scoring Rules
- Distributionally Robust Active Learning for Gaussian Process Regression
- Distributionally Robust Multi-Agent Reinforcement Learning for Dynamic Chute Mapping
- Distributionally Robust Policy Learning under Concept Drifts
- Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective
- Distribution Matching with Structural Regularization via Expressive Score-Based Priors
- DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation
- Diverging Preferences: When do Annotators Disagree and do Models Know?
- Diversified Flow Matching with Translation Identifiability
- Diversifying Robot Locomotion Behaviors with Extrinsic Behavioral Curiosity
- Diversity By Design: Leveraging Distribution Matching for Offline Model-Based Optimization
- Divide and Conquer: Exploring Language-centric Tree Reasoning for Video Question-Answering
- Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning
- Divide and Conquer: Learning Label Distribution with Subtasks
- Diving into Self-Evolve Training for Multimodal Reasoning
- DLP: Dynamic Layerwise Pruning in Large Language Models
- DMM: Distributed Matrix Mechanism for Differentially-Private Federated Learning Based on Constant-Overhead Linear Secret Resharing
- D-MoLE: Dynamic Mixture of Curriculum LoRA Experts for Continual Multimodal Instruction Tuning
- DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
- Do Bayesian Neural Networks Actually Behave Like Bayesian Models?
- DocKS-RAG: Optimizing Document-Level Relation Extraction through LLM-Enhanced Hybrid Prompt Tuning
- DocVXQA: Context-Aware Visual Explanations for Document Question Answering
- Does Data Scaling Lead to Visual Compositional Generalization?
- Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion
- Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis
- Does learning the right latent variables necessarily improve in-context learning?
- Does Low Rank Adaptation Lead to Lower Robustness against Training-Time Attacks?
- Does one-shot give the best shot? Mitigating Model Inconsistency in One-shot Federated Learning
- DOLPHIN: A Programmable Framework for Scalable Neurosymbolic Learning
- Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training
- Domain-Adapted Diffusion Model for PROTAC Linker Design Through the Lens of Density Ratio in Chemical Space
- Do Multiple Instance Learning Models Transfer?
- Do Not Mimic My Voice : Speaker Identity Unlearning for Zero-Shot Text-to-Speech
- Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
- Don't Restart, Just Reuse: Reoptimizing MILPs with Dynamic Parameters
- Double-Filter: Efficient Fine-tuning of Pre-trained Vision-Language Models via Patch&Layer Filtering
- Double Machine Learning for Causal Inference under Shared-State Interference
- Doubly Protected Estimation for Survival Outcomes Utilizing External Controls for Randomized Clinical Trials
- Doubly Robust Conformalized Survival Analysis with Right-Censored Data
- Doubly Robust Fusion of High-Dimensional Treatments for Policy Learning
- Do Vision-Language Models Really Understand Visual Language?
- Do We Need to Verify Step by Step? Rethinking Process Supervision from a Theoretical Perspective
- Do We Really Need Message Passing in Brain Network Modeling?
- DPCore: Dynamic Prompt Coreset for Continual Test-Time Adaptation
- DP-fy your DATA: How to (and why) synthesize Differentially Private Synthetic Data
- DPO Meets PPO: Reinforced Token Optimization for RLHF
- DRAG: Data Reconstruction Attack using Guided Diffusion
- DragLoRA: Online Optimization of LoRA Adapters for Drag-based Image Editing in Diffusion Model
- DragSolver: A Multi-Scale Transformer for Real-World Automotive Drag Coefficient Estimation
- DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization
- DriveGPT: Scaling Autoregressive Behavior Models for Driving
- Drug-TTA: Test-Time Adaptation for Drug Virtual Screening via Multi-task Meta-Auxiliary Learning
- DSBRouter: Solving Global Routing via Diffusion Schrodinger Bridge
- DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers
- DS-VLM: Diffusion Supervision Vision Language Model
- Dual Feature Reduction for the Sparse-group Lasso and its Adaptive Variant
- Dueling Convex Optimization with General Preferences
- DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications
- DVI:A Derivative-based Vision Network for INR
- DyCodeEval: Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination
- Dynamical Modeling of Behaviorally Relevant Spatiotemporal Patterns in Neural Imaging Data
- Dynamical phases of short-term memory mechanisms in RNNs
- Dynamic Similarity Graph Construction with Kernel Density Estimation
- Dynamic Sparse Training of Diagonally Sparse Networks
- DynaMind: Reasoning over Abstract Video Dynamics for Embodied Decision-Making
- DyPolySeg: Taylor Series-Inspired Dynamic Polynomial Fitting Network for Few-shot Point Cloud Semantic Segmentation
- EAGLES: Towards Effective, Efficient, and Economical Federated Graph Learning via Unified Sparsification
- EARL-BO: Reinforcement Learning for Multi-Step Lookahead, High-Dimensional Bayesian Optimization
- Earley-Driven Dynamic Pruning for Efficient Structured Decoding
- EARTH: Epidemiology-Aware Neural ODE with Continuous Disease Transmission Graph
- EasyInv: Toward Fast and Better DDIM Inversion
- EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
- EcoMapper: Generative Modeling for Climate-Aware Satellite Imagery
- Edge-Colored Clustering in Hypergraphs: Beyond Minimizing Unsatisfied Edges
- Editable Concept Bottleneck Models
- Editable Noise Map Inversion: Encoding Target-image into Noise For High-Fidelity Image Manipulation
- EditLord: Learning Code Transformation Rules for Code Editing
- EduLLM: Leveraging Large Language Models and Framelet-Based Signed Hypergraph Neural Networks for Student Performance Prediction
- EEG-Language Pretraining for Highly Label-Efficient Pathology Detection
- EFDTR: Learnable Elliptical Fourier Descriptor Transformer for Instance Segmentation
- Effective and Efficient Masked Image Generation Models
- Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs
- Efficient and Scalable Density Functional Theory Hamiltonian Prediction through Adaptive Sparsity
- Efficient and Scalable Reinforcement Learning for Average Reward under Model Uncertainty
- Efficient and Separate Authentication Image Steganography Network
- Efficient ANN-SNN Conversion with Error Compensation Learning
- Efficient Bisection Projection to Ensure Neural-Network Feasibility for Optimization over General Set
- Efficient Control via Neural-Embedded Iterative Linear Quadratic Regulator
- Efficient Core-set Selection for Deep Learning Through Squared Loss Minimization
- Efficient Curvature-Aware Hypergradient Approximation for Bilevel Optimization
- Efficient Diffusion Models for Symmetric Manifolds
- Efficient Distributed Optimization under Heavy-Tailed Noise
- Efficient Federated Incomplete Multi-View Clustering
- Efficient Fine-Grained Guidance for Diffusion-Based Symbolic Music Generation
- Efficient Generative Modeling with Residual Vector Quantization-Based Tokens
- Efficient Graph Continual Learning via Lightweight Graph Neural Tangent Kernels-based Dataset Distillation
- Efficient Heterogeneity-Aware Federated Active Data Selection
- Efficient Jailbreaking of Open-Source LLMs in Inference Time
- Efficient Length-Generalizable Attention via Causal Retrieval for Long-Context Language Modeling
- Efficient LiDAR Reflectance Compression via Scanning Serialization
- Efficient Logit-based Knowledge Distillation of Deep Spiking Neural Networks for Full-Range Timestep Deployment
- Efficient Long Context Fine-tuning with Chunk Flow
- Efficiently Access Diffusion Fisher: Within the Outer Product Span Space
- Efficiently Serving Large Multimodal Models Using EPD Disaggregation
- Efficiently Vectorized MCMC on Modern Accelerators
- Efficient Molecular Conformer Generation with SO(3)-Averaged Flow Matching and Reflow
- Efficient Motion Prompt Learning for Robust Visual Tracking
- Efficient Multi-modal Long Context Learning for Training-free Adaptation
- Efficient Multi-Objective Learning under Preference Guidance: A First-Order Penalty Approach
- Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination
- Efficient Network Automatic Relevance Determination
- Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis
- Efficient Noise Calculation in Deep Learning-based MRI Reconstructions
- Efficient optimization with orthogonality constraint: a randomized Riemannian submanifold method
- Efficient Parallel Training Methods for Spiking Neural Networks with Constant Time Complexity
- Efficient Personalized Adaption for Physiological Signal Foundation Model
- Efficient Quantification of Multimodal Interaction at Sample Level
- Efficient Robotic Policy Learning via Latent Space Backward Planning
- Efficient Robust Conformal Prediction via Lipschitz-Bounded Networks
- Efficient Skill Discovery via Regret-Aware Optimization
- Efficient Source-free Unlearning via Energy-Guided Data Synthesis and Discrimination-Aware Multitask Optimization
- Efficient Time Series Processing for Transformers and State-Space Models through Token Merging
- e-GAI: e-value-based Generalized $\alpha$-Investing for Online False Discovery Rate Control
- EgoPrivacy: What Your First-Person Camera Says About You?
- EGPlace: An Efficient Macro Placement Method via Evolutionary Search with Greedy Repositioning Guided Mutation
- Ehrenfeucht-Haussler Rank and Chain of Thought
- Eigen Analysis of Conjugate Kernel and Neural Tangent Kernel
- Eigenspectrum Analysis of Weight Matrices without Aspect Ratio Bias
- ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics
- Eliciting Language Model Behaviors with Investigator Agents
- ELITE: Enhanced Language-Image Toxicity Evaluation for Safety
- ELMO : Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces
- ELoRA: Low-Rank Adaptation for Equivariant GNNs
- Elucidating Flow Matching ODE Dynamics via Data Geometry and Denoisers
- Elucidating the design space of language models for image generation
- Elucidating the Design Space of Multimodal Protein Language Models
- Embedding Safety into RL: A New Take on Trust Region Methods
- EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
- Emergence and Effectiveness of Task Vectors in In-Context Learning : An Encoder Decoder Perspective
- Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
- Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
- Emergent Response Planning in LLM
- Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models
- EmoGrowth: Incremental Multi-label Emotion Decoding with Augmented Emotional Relation Graph
- Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
- Emotional Face-to-Speech
- Empirical Design in Reinforcement Learning
- Empirical Privacy Variance
- Empowering Time Series Foundation Models with Sparse Mixture of Experts
- Empowering World Models with Reflection for Embodied Video Prediction
- Empower Structure-based Molecule Optimization with Gradient Guidance
- Emulating Complex Coastal Processes with Transformers: A Decade-Long High-Resolution Dataset and Preliminary Results
- Enabling Optimal Decisions in Rehearsal Learning under CARE Condition
- ENAHPool: The Edge-Node Attention-based Hierarchical Pooling for Graph Neural Networks
- End-to-End Machine-Learning Framework for Solving Non-Markovian Optimal Control
- Energy-Based Flow Matching for Generating 3D Molecular Structure
- Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
- Enforcing Idempotency in Neural Networks
- Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation
- Enhancing Adversarial Robustness with Conformal Prediction: A Framework for Guaranteed Model Reliability
- Enhancing Certified Robustness via Block Reflector Orthogonal Layers
- Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration
- Enhancing Cross-Layer Information Flow in Transformers with Multiway Dynamic Dense Connections
- Enhancing Decision-Making of Large Language Models via Actor-Critic
- Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story
- Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization
- Enhancing Foundation Models with Federated Domain Knowledge Infusion
- Enhancing Graph Contrastive Learning for Protein Graphs from Perspective of Invariance
- Enhancing Graph Invariant Learning from a Negative Inference Perspective
- Enhancing Ligand Validity and Affinity in Structure-Based Drug Design with Multi-Reward Optimization
- Enhancing Logits Distillation with Plug&Play Kendall's $\tau$ Ranking Loss
- Enhancing Parallelism in Decentralized Stochastic Convex Optimization
- Enhancing Performance of Explainable AI Models with Constrained Concept Refinement
- Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models
- Enhancing Spectral GNNs: From Topology and Perturbation Perspectives
- Enhancing Statistical Validity and Power in Hybrid Controlled Trials: A Randomization Inference Approach with Conformal Selective Borrowing
- Enhancing Target-unspecific Tasks through a Features Matrix
- Enhancing the Influence of Labels on Unlabeled Nodes in Graph Convolutional Networks
- Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective
- Enhancing Visual Localization with Cross-Domain Image Generation
- Ensemble Distribution Distillation via Flow Matching
- Ensemble Learned Bloom Filters: Two Oracles are Better than One
- EnsLoss: Stochastic Calibrated Loss Ensembles for Preventing Overfitting in Classification
- ENSUR: Equitable and Statistically Unbiased Recommendation
- EpiCoder: Encompassing Diversity and Complexity in Code Generation
- Equivalence is All: A Unified View for Self-supervised Graph Learning
- EquivaMap: Leveraging LLMs for Automatic Equivalence Checking of Optimization Formulations
- Equivariant Neural Tangent Kernels
- Equivariant Polynomial Functional Networks
- EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
- EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers
- Ergodic Generative Flows
- ERICT: Enhancing Robustness by Identifying Concept Tokens in Zero-Shot Vision Language Models
- Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems
- ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models
- ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans
- Estimating the Correctness of Language Model Predictions from Internal Causal Mechanisms
- ETTA: Elucidating the Design Space of Text-to-Audio Models
- Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
- Evaluating LLMs Across Multi-Cognitive Levels: From Medical Knowledge Mastery to Scenario-Based Problem Solving
- Evaluating Neuron Explanations: A Unified Framework with Sanity Checks
- EvalX: A Platform for Code LLM Evaluation in the Wild
- Event-Customized Image Generation
- Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition
- EvFocus: Learning to Reconstruct Sharp Images from Out-of-Focus Event Streams
- EvoControl: Multi-Frequency Bi-Level Control for High-Frequency Continuous Control
- EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration
- Evolving Minds: Logic-Informed Inference from Temporal Action Patterns
- EvoMesh: Adaptive Physical Simulation with Hierarchical Graph Evolutions
- EvoPress: Accurate Dynamic Model Compression via Evolutionary Search
- Exactly Tight Information-theoretic Generalization Bounds via Binary Jensen-Shannon Divergence
- Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements
- Exact risk curves of signSGD in High-Dimensions: quantifying preconditioning and noise-compression effects
- Exact Upper and Lower Bounds for the Output Distribution of Neural Networks with Random Inputs
- ExLM: Rethinking the Impact of $\texttt{[MASK]}$ Tokens in Masked Language Models
- Exogenous Isomorphism for Counterfactual Identifiability
- Expected Variational Inequalities
- Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
- Explaining, Fast and Slow: Abstraction and Refinement of Provable Explanations
- Explaining the role of Intrinsic Dimensionality in Adversarial Training
- Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
- Explicit Discovery of Nonlinear Symmetries from Dynamic Data
- Explicit Exploration for High-Welfare Equilibria in Game-Theoretic Multiagent Reinforcement Learning
- Explicit Preference Optimization: No Need for an Implicit Reward Model
- Exploiting Curvature in Online Convex Optimization with Delayed Feedback
- Exploiting Presentative Feature Distributions for Parameter-Efficient Continual Learning of Large Language Models
- Exploiting Similarity for Computation and Communication-Efficient Decentralized Optimization
- ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
- Exploration in AI Today (EXAIT)
- Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
- Exploring Criteria for Enhanced Loss Reweighting in Large Language Model Unlearning
- Exploring Invariance in Images through One-way Wave Equations
- Exploring Large Action Sets with Hyperspherical Embeddings using von Mises-Fisher Sampling
- Exploring Representations and Interventions in Time Series Foundation Models
- Exploring Vsion Semantic Prompt for Efficient Point Cloud Understanding
- Exponential Family Variational Flow Matching for Tabular Data Generation
- ExpProof : Operationalizing Explanations for Confidential Models with ZKPs
- Expressive Geometric Generative Modeling with Noise Conditioned Graph Networks
- Expressive Power of Graph Neural Networks for (Mixed-Integer) Quadratic Programs
- ExtPose: Robust and Coherent Pose Estimation with Expanding ViT
- Extracting Rare Dependence Patterns via Adaptive Sample Reweighting
- Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts
- Extreme Value Policy Optimization for Safe Reinforcement Learning
- Ex-VAD: Explainable Fine-grained Video Anomaly Detection Based on Visual-Language Models
- FAB-PPI: Frequentist, Assisted by Bayes, Prediction-Powered Inference
- FACTER: Fairness-Aware Conformal Thresholding and Prompt Engineering for Enabling Fair LLM-Based Recommender Systems
- FactTest: Factuality Testing in Large Language Models with Statistical Guarantees
- Fair Clustering via Alignment
- FairICP: Encouraging Equalized Odds via Inverse Conditional Permutation
- Fairness on Principal Stratum: A New Perspective on Counterfactual Fairness
- Fairness Overfitting in Machine Learning: An Information-Theoretic Perspective
- FairPFN: A Tabular Foundation Model for Causal Fairness
- Falcon: Fast visuomotor policy via partial denoising
- Falsification of Unconfoundedness by Testing Independence of Causal Mechanisms
- Fast, Accurate Manifold Denoising by Tunneling Riemannian Optimization
- Fast and Provable Algorithms for Sparse PCA with Improved Sample Complexity
- Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments
- FastCAV: Efficient Computation of Concept Activation Vectors for Explaining Deep Neural Networks
- Faster and Stronger: When ANN-SNN Conversion Meets Parallel Spiking Calculation
- Faster Approximation Algorithms for k-Center via Data Reduction
- Faster Global Minimum Cut with Predictions
- Faster Rates for Private Adversarial Bandits
- Faster Stochastic Optimization with Arbitrary Delays via Adaptive Asynchronous Mini-Batching
- Fast Estimation of Partial Dependence Functions using Trees
- Fast Exact Unlearning of Fine-Tuning Data for LLMs
- Fast Incomplete Multi-view Clustering by Flexible Anchor Learning
- Fast inference with Kronecker-sparse matrices
- Fast Min-$\epsilon$ Segmented Regression using Constant-Time Segment Merging
- Fast Tensor Completion via Approximate Richardson Iteration
- Fast Video Generation with Sliding Tile Attention
- FCL: A Function Calling Leaderboard for Large Language Models
- FDGen: A Fairness-Aware Graph Generation Model
- Feasible Action Search for Bandit Linear Programs via Thompson Sampling
- FEAT-KD: Learning Concise Representations for Single and Multi-Target Regression via TabNet Knowledge Distillation
- FeatSharp: Your Vision Model Features, Sharper
- Feature Importance Metrics in the Presence of Missing Data
- Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions
- Feature-Mapping Topology Optimization with Neural Heaviside Signed Distance Functions
- Feature out! Let Raw Image as Your Condition for Blind Face Restoration
- Features are fate: a theory of transfer learning in high-dimensional regression
- Feature Shift Localization Network
- FedBEns: One-Shot Federated Learning based on Bayesian Ensemble
- FedClean: A General Robust Label Noise Correction for Federated Learning
- FedECADO: A Dynamical System Model of Federated Learning
- Federated Causal Structure Learning with Non-identical Variable Sets
- Federated Disentangled Tuning with Textual Prior Decoupling and Visual Dynamic Adaptation
- Federated Generalised Variational Inference: A Robust Probabilistic Federated Learning Framework
- Federated Incomplete Multi-view Clustering with Globally Fused Graph Guidance
- Federated In-Context Learning: Iterative Refinement for Improved Answer Quality
- Federated Learning for Feature Generalization with Convex Constraints
- Federated Node-Level Clustering Network with Corss-Subgraph Link Mending
- Federated Oriented Learning: A Practical One-Shot Personalized Federated Learning Framework
- FedOne: Query-Efficient Federated Learning for Black-box Discrete Prompt Learning
- FedPHA: Federated Prompt Learning for Heterogeneous Client Adaptation
- FedSMU: Communication-Efficient and Generalization-Enhanced Federated Learning through Symbolic Model Updates
- FedSSI: Rehearsal-Free Continual Federated Learning with Synergistic Synaptic Intelligence
- Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models
- Few-Shot Learner Generalizes Across AI-Generated Image Detection
- Few-shot Species Range Estimation
- Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts
- FG-CLIP: Fine-Grained Visual and Textual Alignment
- FicGCN: Unveiling the Homomorphic Encryption Efficiency from Irregular Graph Convolutional Networks
- FIC-TSC: Learning Time Series Classification with Fisher Information Constraint
- Field Matching: an Electrostatic Paradigm to Generate and Transfer Data
- Finding Bipartite-like Clusters on the Fly
- Finding Information Quality: Counterfactual Voting Adjustment for Quality Assessment and Fairer Voting in Online Platforms with Helpfulness Evaluation
- Finding Wasserstein Ball Center: Efficient Algorithm and The Applications in Fairness
- Fine-Grained Video Captioning through Scene Graph Consolidation
- Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean Field Games
- Finite-Time Analysis of Discrete-Time Stochastic Interpolants
- Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methods for Decentralized Multi-Agent Reinforcement Learning
- FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
- Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator
- Fixed-Confidence Multiple Change Point Identification under Bandit Feedback
- Fixing the Double Penalty in Data-Driven Weather Forecasting Through a Modified Spherical Harmonic Loss Function
- Fixing the Loose Brake: Exponential Tail Bounds for Stopping Time in Best Arm Identification
- FLAM: Frame-Wise Language-Audio Modeling
- FlashTP: Fused, Sparsity-Aware Tensor Product for Machine Learning Interatomic Potentials
- Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape
- FlatQuant: Flatness Matters for LLM Quantization
- Fleet of Agents: Coordinated Problem Solving with Large Language Models
- Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation
- FlexControl: Computation-Aware ControlNet with Differentiable Router for Text-to-Image Generation
- Flexibility-conditioned protein structure design with flow matching
- Flexible and Efficient Grammar-Constrained Decoding
- Flexible, Efficient, and Stable Adversarial Attacks on Machine Unlearning
- Flexible Tails for Normalizing Flows
- FlexiClip: Locality-Preserving Free-Form Character Animation
- FlexiReID: Adaptive Mixture of Expert for Multi-modal Person Re-Identification
- FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
- FlipAttack: Jailbreak LLMs via Flipping
- Floating-Point Neural Networks Can Represent Almost All Floating-Point Functions
- FloE: On-the-Fly MoE Inference on Memory-constrained GPU
- Flopping for FLOPs: Leveraging equivariance for computational efficiency
- FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching
- Flow-based Domain Randomization for Learning and Sequencing Robotic Skills
- FlowDrag: 3D-aware Drag-based Image Editing with Mesh-guided Deformation Vector Flow Fields
- Flow-field inference from neural data using deep recurrent networks
- Flowing Datasets with Wasserstein over Wasserstein Gradient Flows
- Flowing Through Continuous-Time Generative Models: A Clear and Systematic Tour
- Flow Matching for Denoised Social Recommendation
- Flow Matching for Few-Trial Neural Adaptation with Stable Latent Dynamics
- Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options
- Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
- Flow Q-Learning
- Fluctuations of the largest eigenvalues of transformed spiked Wigner matrices
- Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification
- FOCoOp: Enhancing Out-of-Distribution Robustness in Federated Prompt Learning for Vision-Language Models
- Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
- Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
- Foundation Model Canonicalization: A Zero-Shot Path to Invariant Perception
- Foundation Model Insights and a Multi-model Approach for Superior Fine-grained One-shot Subset Selection
- Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages
- FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making
- FourierMamba: Fourier Learning Integration with State Space Models for Image Deraining
- Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
- Fragments to Facts: Partial-Information Fragment Inference from LLMs
- FrameBridge: Improving Image-to-Video Generation with Bridge Models
- Fraud-Proof Revenue Division on Subscription Platforms
- Free Process Rewards without Process Labels
- Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
- From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models
- From Complex to Atomic: Enhancing Augmented Generation via Knowledge-Aware Dual Rewriting and Reasoning
- From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models
- From Hours to Minutes: Achieving Lossless Acceleration in 100K-Token Long Sequence Generation
- From Individual Experience to Collective Evidence: A Reporting-Based Framework for Identifying Systemic Harms
- From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
- From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning
- From Language Models over Tokens to Language Models over Characters
- From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection
- From Logits to Hierarchies: Hierarchical Clustering made Simple
- From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
- From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models
- From Pixels to Perception: Interpretable Predictions via Instance-wise Grouped Feature Selection
- From Pre-Training Foundation Models to Zero-Shot Prediction: Learning Paths, Prompt Complexity, and Residual Dependence
- From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
- From Spectrum-free towards Baseline-view-free: Double-track Proximity Driven Multi-view Clustering
- From Theory to Practice: Rethinking Green and Martin Kernels for Unleashing Graph Transformers
- From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining
- From Uncertain to Safe: Conformal Fine-Tuning of Diffusion Models for Safe PDE Control
- From Weight-Based to State-Based Fine-Tuning: Further Memory Reduction on LoRA with Parallel Control
- FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
- FSL-SAGE: Accelerating Federated Split Learning via Smashed Activation Gradient Estimation
- FSTLLM: Spatio-Temporal LLM for Few Shot Time Series Forecasting
- Fully compartmentalizing visual memory perception with procedural data
- Fully Dynamic Embedding into $\ell_p$ Spaces
- Fully Dynamic Euclidean Bi-Chromatic Matching in Sublinear Update Time
- Fully Heteroscedastic Count Regression with Deep Double Poisson Networks
- FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch
- Functional Alignment Can Mislead: Examining Model Stitching
- Function Encoders: A Principled Approach to Transfer Learning in Hilbert Spaces
- Function-Space Learning Rates
- Function-to-Style Guidance of LLMs for Code Translation
- Fundamental Bias in Inverting Random Sampling Matrices with Application to Sub-sampled Newton
- Fundamental limits of learning in sequence multi-index models and deep attention networks: high-dimensional asymptotics and sharp thresholds
- Fundamental Limits of Visual Autoregressive Transformers: Universal Approximation Abilities
- FuseUNet: A Multi-Scale Feature Fusion Method for U-like Networks
- Fusing Reward and Dueling Feedback in Stochastic Bandits
- G-Adaptivity: optimised graph-based mesh relocation for finite element methods
- Game-theoretic Statistics and Sequential Anytime-Valid Inference
- GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models
- Gap-Dependent Bounds for Federated $Q$-Learning
- GAPrompt: Geometry-Aware Point Cloud Prompt for 3D Vision Model
- Gaussian Mixture Flow Matching Models
- GaussMark: A Practical Approach for Structural Watermarking of Language Models
- GaussMarker: Robust Dual-Domain Watermark for Diffusion Models
- GCAL: Adapting Graph Models to Evolving Domain Shifts
- G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks
- GEFA: A General Feature Attribution Framework Using Proxy Gradient Estimation
- General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization
- Generalists vs. Specialists: Evaluating LLMs on Highly-Constrained Biophysical Sequence Optimization Tasks
- Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning
- Generalization Analysis for Contrastive Representation Learning under Non-IID Settings
- Generalization Analysis for Controllable Learning
- Generalization and Robustness of the Tilted Empirical Risk
- Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks
- Generalization in Federated Learning: A Conditional Mutual Information Framework
- Generalization of noisy SGD in unbounded non-convex settings
- Generalization Performance of Ensemble Clustering: From Theory to Algorithm
- Generalization Principles for Inference over Text-Attributed Graphs with Large Language Models
- Generalized additive models via direct optimization of regularized decision stump forests
- Generalized Category Discovery via Reciprocal Learning and Class-wise Distribution Regularization
- Generalized Interpolating Discrete Diffusion
- Generalized Random Forests Using Fixed-Point Trees
- Generalized Smooth Bilevel Optimization with Nonconvex Lower-Level
- Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction
- Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
- Generalizing Treatment Effects from Randomized Controlled Trials across Environments
- Generating Hypotheses of Dynamic Causal Graphs in Neuroscience: Leveraging Generative Factor Models of Observed Time Series
- Generation from Noisy Examples
- Generative AI Meets Reinforcement Learning
- Generative Audio Language Modeling with Continuous-valued Tokens and Masked Next-Token Prediction
- Generative Data Mining with Longtail-Guided Diffusion
- Generative Human Trajectory Recovery via Embedding-Space Conditional Diffusion
- Generative Intervention Models for Causal Perturbation Modeling
- Generative Point Cloud Registration
- Generative Social Choice: The Next Generation
- GenMol: A Drug Discovery Generalist with Discrete Diffusion
- GenZSL: Generative Zero-Shot Learning Via Inductive Variational Autoencoder
- Geometric Algebra Planes: Convex Implicit Neural Volumes
- Geometric and Physical Constraints Synergistically Enhance Neural PDE Surrogates
- Geometric Contact Flows: Contactomorphisms for Dynamics and Control
- Geometric Feature Embedding for Effective 3D Few-Shot Class Incremental Learning
- Geometric Hyena Networks for Large-scale Equivariant Learning
- Geometric Median (GM) Matching for Robust $k$-Subset Selection from Noisy Data
- Geometric Representation Condition Improves Equivariant Molecule Generation
- Geometric Resampling in Nearly Linear Time for Follow-the-Perturbed-Leader with Best-of-Both-Worlds Guarantee in Bandit Problems
- Geometry-Informed Neural Networks
- GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing
- GHOST: Generalizable One-Shot Federated Graph Learning with Proxy-Based Topology Knowledge Retention
- GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation
- GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras
- GL-LowPopArt: A Nearly Instance-Wise Minimax Estimator for (Adaptive) Generalized Linear Low-Rank Trace Regression
- Global Context-aware Representation Learning for Spatially Resolved Transcriptomics
- Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $\mu$ Parametrization
- Global curvature for second-order optimization of neural networks
- Global-Local Dirichlet Processes for Clustering Grouped Data in the Presence of Group-Specific Idiosyncratic Variables
- Global Optimization with A Power-Transformed Objective and Gaussian Smoothing
- GMAIL: Generative Modality Alignment for generated Image Learning
- Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning
- Goal-Space Planning with Subgoal Models
- Going Deeper into Locally Differentially Private Graph Neural Networks
- GoIRL: Graph-Oriented Inverse Reinforcement Learning for Multimodal Trajectory Prediction
- GPEN: Global Position Encoding Network for Enhanced Subgraph Representation Learning
- GPTQv2: Efficient Finetuning-Free Quantization for Asymmetric Calibration
- GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning
- Gradient Aligned Regression via Pairwise Losses
- Gradient-based Explanations for Deep Learning Survival Models
- Gradient Boosting Reinforcement Learning
- Gradient Descent Converges Arbitrarily Fast for Logistic Regression via Large and Adaptive Stepsizes
- Gradient Flow Provably Learns Robust Classifiers for Orthonormal GMMs
- Gradient Inversion of Multi-Modal Models
- GradPS: Resolving Futile Neurons in Parameter Sharing Network for Multi-Agent Reinforcement Learning
- Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning
- GRAIL: Graph Edit Distance and Node Alignment using LLM-Generated Code
- GRAM: A Generative Foundation Reward Model for Reward Generalization
- Grammar-Forced Translation of Natural Language to Temporal Logic using LLMs
- Graph4MM: Weaving Multimodal Learning with Structural Information
- Graph Adaptive Autoregressive Moving Average Models
- Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning
- Graph Attention is Not Always Beneficial: A Theoretical Analysis of Graph Attention Mechanisms via Contextual Stochastic Block Models
- Graph-based Algorithms for Diverse Similarity Search
- GraphCL: Graph-based Clustering for Semi-Supervised Medical Image Segmentation
- Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models
- Graph Diffusion for Offline Multi-Agent Coordination
- Graph Generative Pre-trained Transformer
- GraphGPT: Generative Pre-trained Graph Eulerian Transformer
- Graph Inverse Style Transfer for Counterfactual Explainability
- Graph Minimum Factor Distance and Its Application to Large-Scale Graph Data Clustering
- Graph Neural Network Generalization With Gaussian Mixture Model Based Augmentation
- Graph-Supported Dynamic Algorithm Configuration for Multi-Objective Combinatorial Optimization
- Graph World Model
- Gravity-Bench-v1: A Benchmark on Gravitational Physics Discovery for Agents
- Great Language Models Think Alike and this Undermines AI Oversight
- Gridded Transformer Neural Processes for Spatio-Temporal Data
- Griffin: Towards a Graph-Centric Relational Database Foundation Model
- GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers
- Grokking at the Edge of Linear Separability
- Grokking Beyond the Euclidean Norm of Model Parameters
- Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
- GRU: Mitigating the Trade-off between Unlearning and Retention for LLMs
- GS-Bias: Global-Spatial Bias Learner for Single-Image Test-Time Adaptation of Vision-Language Models
- G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration
- GSM-$\infty$: How Do your LLMs Behave over Infinitely Increasing Reasoning Complexity and Context Length?
- GTR: A General, Multi-View, and Dynamic Framework for Trajectory Representation Learning
- Guarantees of a Preconditioned Subgradient Algorithm for Overparameterized Asymmetric Low-rank Matrix Recovery
- GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning
- Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics
- GuidedQuant: Large Language Model Quantization via Exploiting End-to-End Loss Guidance
- Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents
- Guided Structural Inference: Leveraging Priors with Soft Gating Mechanisms
- Guided Zeroth-Order Methods for Stochastic Non-convex Problems with Decision-Dependent Distributions
- Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding
- Habitizing Diffusion Planning for Efficient and Effective Decision Making
- HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training
- Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin
- HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
- Hardware and Software Platform Inference
- HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration
- Harmonizing Geometry and Uncertainty: Diffusion with Hyperspheres
- Harnessing Heterogeneous Statistical Strength for Personalized Federated Learning via Hierarchical Bayesian Inference
- Harnessing Low Dimensionality in Diffusion Models: From Theory to Practice
- HashAttention: Semantic Sparsity for Faster Inference
- Haste Makes Waste: A Simple Approach for Scaling Graph Neural Networks
- Heads up! Large Language Models Can Perform Tasks Without Your Instruction via Selective Attention Head Masking
- HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
- HEAP: Hyper Extended APDHG Operator for Constrained High-dim PDEs
- Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update
- Hessian Geometry of Latent Space in Generative Models
- Heterogeneous Data Game: Characterizing the Model Competition Across Multiple Data Sources
- Heterogeneous Label Shift: Theory and Algorithm
- Heterogeneous Sufficient Dimension Reduction and Subspace Clustering
- Heterogeneous Treatment Effect in Time-to-Event Outcomes: Harnessing Censored Data with Recursively Imputed Trees
- HetSSNet: Spatial-Spectral Heterogeneous Graph Learning Network for Panchromatic and Multispectral Images Fusion
- Hidden No More: Attacking and Defending Private Third-Party LLM Inference
- Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It
- Hierarchical Equivariant Policy via Frame Transfer
- Hierarchical Graph Tokenization for Molecule-Language Alignment
- Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots
- Hierarchical Overlapping Clustering on Graphs: Cost Function, Algorithm and Scalability
- Hierarchical Planning for Complex Tasks with Knowledge Graph-RAG and Symbolic Verification
- Hierarchical Refinement: Optimal Transport to Infinity and Beyond
- Hierarchical Reinforcement Learning with Targeted Causal Interventions
- Hierarchical Reinforcement Learning with Uncertainty-Guided Diffusional Subgoals
- High-Dimensional Prediction for Sequential Decision Making
- High-Dimensional Tensor Regression With Oracle Properties
- High Dynamic Range Novel View Synthesis with Single Exposure
- High-Fidelity Simultaneous Speech-To-Speech Translation
- Highly Compressed Tokenizer Can Generate Without Training
- High Probability Bound for Cross-Learning Contextual Bandits with Unknown Context Distributions
- Hi-Patch: Hierarchical Patch GNN for Irregular Multivariate Time Series
- HiRemate: Hierarchical Approach for Efficient Re-materialization of Neural Networks
- Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
- History-Guided Video Diffusion
- Holistic Physics Solver: Learning PDEs in a Unified Spectral-Physical Space
- Homophily Enhanced Graph Domain Adaptation
- (How) Can Transformers Predict Pseudo-Random Numbers?
- How compositional generalization and creativity improve as diffusion models are trained
- How Contaminated Is Your Benchmark? Quantifying Dataset Leakage in Large Language Models with Kernel Divergence
- How Distributed Collaboration Influences the Diffusion Model Training? A Theoretical Perspective
- How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction
- How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation
- (How) Do Language Models Track State?
- How Do Large Language Monkeys Get Their Power (Laws)?
- How Do Transformers Learn Variable Binding?
- How Effective Can Dropout Be in Multiple Instance Learning ?
- How Expressive are Knowledge Graph Foundation Models?
- How Far Is Video Generation from World Model: A Physical Law Perspective
- How Much Can Transfer? BRIDGE: Bounded Multi-Domain Graph Pre-training and Prompt Learning with Generalization Error
- How Much Can We Forget about Data Contamination?
- How to Evaluate and Mitigate IP Infringement in Visual Generative AI?
- How to Move Your Dragon: Text-To-Motion Synthesis For Large-Vocabulary Objects
- How to set AdamW's weight decay as you scale model and dataset size
- How to Steer LLM Latents for Hallucination Detection?
- How to Synthesize Text Data without Model Collapse?
- How to Train Your Multi-Exit Model? Analyzing the Impact of Training Strategies
- How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
- How transformers learn structured data: insights from hierarchical filtering
- How Useful are Your Jailbreak Outputs?
- HPS: Hard Preference Sampling for Human Preference Alignment
- H-Tuning: Toward Low-Cost and Efficient ECG-based Cardiovascular Disease Detection with Pre-Trained Models
- Human-Aligned Image Models Improve Visual Decoding from the Brain
- Human Body Restoration with One-Step Diffusion Model and A New Benchmark
- Human Cognition-Inspired Hierarchical Fuzzy Learning Machine
- Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated Learning
- HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder
- Hybrid Quantum-Classical Multi-Agent Pathfinding
- Hybrid Spiking Vision Transformer for Object Detection with Event Cameras
- Hyperband-based Bayesian Optimization for Black-box Prompt Selection
- Hyperbolic GNN: Spectral Graph Neural Networks in the Perspective of A System of Hyperbolic Partial Differential Equations
- Hyperbolic Graph Transformer for Collaborative Filtering
- Hypergraph Coordination Networks with Dynamic Grouping for Multi-Agent Reinforcement Learning
- Hyper: Hyperparameter Robust Efficient Exploration in Reinforcement Learning
- HyperIMTS: Hypergraph Neural Network for Irregular Multivariate Time Series Forecasting
- HyperIV: Real-time Implied Volatility Smoothing
- HyperNear: Unnoticeable Node Injection Attacks on Hypergraph Neural Networks
- Hyperspherical Normalization for Scalable Deep Reinforcement Learning
- Hyper-Transforming Latent Diffusion Models
- HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking
- Hypo3D: Exploring Hypothetical Reasoning in 3D
- Hypothesis Testing for Generalized Thurstone Models
- IBCircuit: Towards Holistic Circuit Discovery with Information Bottleneck
- ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks
- ICML 2025 Workshop on Collaborative and Federated Agentic Workflows (CFAgentic @ ICML'25)
- ICML 2025 Workshop on Computational Optimization of Buildings (CO-BUILD)
- Identifiable Object Representations under Spatial Ambiguities
- Identification of Latent Confounders via Investigating the Tensor Ranks of the Nonlinear Observations
- Identifying and Understanding Cross-Class Features in Adversarial Training
- Identifying biological perturbation targets through causal differential networks
- Identifying Causal Direction via Variational Bayesian Compression
- Identifying metric structures of deep latent variable models
- Identifying neural dynamics using interventional state space models
- Idiosyncrasies in Large Language Models
- iDPA: Instance Decoupled Prompt Attention for Continual Medical Object Detection
- IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic
- Imagine While Reasoning in Space: Multimodal Visualization-of-Thought
- Imitation Learning from a Single Temporally Misaligned Video
- IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling
- Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks
- Implicit degree bias in the link prediction task
- Implicit Language Models are RNNs: Balancing Parallelization and Expressivity
- Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent
- Implicit Riemannian Optimism with Applications to Min-Max Problems
- Implicit Subgraph Neural Network
- Importance Corrected Neural JKO Sampling
- Importance Sampling for Nonlinear Models
- Impossible Videos
- Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation
- Improved and Oracle-Efficient Online $\ell_1$-Multicalibration
- Improved Approximations for Hard Graph Problems using Predictions
- Improved Coresets for Vertical Federated Learning: Regularized Linear and Logistic Regressions
- Improved Discretization Complexity Analysis of Consistency Models: Variance Exploding Forward Process and Decay Discretization Scheme
- Improved Expressivity of Hypergraph Neural Networks through High-Dimensional Generalized Weisfeiler-Leman Algorithms
- Improved Last-Iterate Convergence of Shuffling Gradient Methods for Nonsmooth Convex Optimization
- Improved learning via k-DTW: a novel dissimilarity measure for curves
- Improved Lower Bounds for First-order Stochastic Non-convex Optimization under Markov Sampling
- Improved Off-policy Reinforcement Learning in Biological Sequence Design
- Improved Online Confidence Bounds for Multinomial Logistic Bandits
- Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance
- Improved Sample Complexity for Private Nonsmooth Nonconvex Optimization
- Improved Theoretically-Grounded Evolutionary Algorithms for Subset Selection with a Linear Cost Constraint
- Improving Compositional Generation with Diffusion Models Using Lift Scores
- Improving Consistency Models with Generator-Augmented Flows
- Improving Continual Learning Performance and Efficiency with Auxiliary Classifiers
- Improving Conversational Capabilities of Speech Language Models via Generative Dual-channel Spoken Dialogue Learning
- Improving Diversity in Language Models: When Temperature Fails, Change the Loss
- Improving Flow Matching by Aligning Flow Divergence
- Improving Generalization in Federated Learning with Highly Heterogeneous Data via Momentum-Based Stochastic Controlled Weight Averaging
- Improving Generalization with Flat Hilbert Bayesian Inference
- Improving LLM Safety Alignment with Dual-Objective Optimization
- Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens
- Improving LLM Video Understanding with 16 Frames Per Second
- Improving Memory Efficiency for Training KANs via Meta Learning
- Improving Model Alignment Through Collective Intelligence of Open-Source Models
- Improving Multi-Class Calibration through Normalization-Aware Isotonic Techniques
- Improving Multimodal Learning Balance and Sufficiency through Data Remixing
- Improving Out-of-Distribution Detection via Dynamic Covariance Calibration
- Improving Out-of-Distribution Detection with Markov Logic Networks
- Improving Parallel Program Performance with LLM Optimizers via Agent-System Interface
- Improving Rationality in the Reasoning Process of Language Models through Self-playing Game
- Improving Reward Model Generalization from Adversarial Process Enhanced Preferences
- Improving Robustness to Subpopulation Shifts by Heuristic Subspace Exploration with Enhanced Diversification
- Improving Soft Unification with Knowledge Graph Embedding Methods
- Improving the Continuity of Goal-Achievement Ability via Policy Self-Regularization for Goal-Conditioned Reinforcement Learning
- Improving the Effective Receptive Field of Message-Passing Neural Networks
- Improving the Scaling Laws of Synthetic Data with Deliberate Practice
- Improving the statistical efficiency of cross-conformal prediction
- Improving the Variance of Differentially Private Randomized Experiments through Clustering
- Improving Transformer World Models for Data-Efficient RL
- Improving Value Estimation Critically Enhances Vanilla Policy Gradient
- Improving Your Model Ranking on Chatbot Arena by Vote Rigging
- Improving Zero-Shot Adversarial Robustness in VLMs by Closed-form Alignment of Adversarial Path Simplices
- IMTS is Worth Time $\times$ Channel Patches: Visual Masked Autoencoders for Irregular Multivariate Time Series Prediction
- iN2V: Bringing Transductive Node Embeddings to Inductive Graphs
- Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
- In-Context Adaptation to Concept Drift for Learned Database Operations
- In-context denoising with one-layer transformers: connections between attention and associative memory retrieval
- In-Context Fine-Tuning for Time-Series Foundation Models
- In-Context Learning and Occam's Razor
- In-Context Learning as Conditioned Associative Memory Retrieval
- In-Context Linear Regression Demystified: Training Dynamics and Expressive Power of Multi-Head Softmax Attention
- In-Context Meta Learning Induces Multi-Phase Circuit Emergence
- In-Context Reinforcement Learning From Suboptimal Historical Data
- Incorporating Arbitrary Matrix Group Equivariance into KANs
- Incremental Gradient Descent with Small Epoch Counts is Surprisingly Slow on Ill-Conditioned Problems
- Independence Tests for Language Models
- Inducing, Detecting and Characterising Neural Modules: A Pipeline for Functionally Interpretability in Reinforcement Learning
- Inductive Gradient Adjustment for Spectral Bias in Implicit Neural Representations
- InfAlign: Inference-aware language model alignment
- Inference-Time Alignment of Diffusion Models with Direct Noise Optimization
- Inference-Time Alignment of LLMs via User-Specified Multi-Criteria Transfer Decoding
- Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models
- Info-Coevolution: An Efficient Framework for Data Model Coevolution
- InfoCons: Identifying Interpretable Critical Concepts in Point Clouds via Information Theory
- Information Bottleneck-guided MLPs for Robust Spatial-temporal Forecasting
- InfoSAM: Fine-Tuning the Segment Anything Model from An Information-Theoretic Perspective
- InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference
- INRFlow: Flow Matching for INRs in Ambient Space
- Instance Correlation Graph-based Naive Bayes
- Instance-Optimal Pure Exploration for Linear Bandits on Continuous Arms
- Instruct2See: Learning to Remove Any Obstructions Across Distributions
- Instruction-Following Pruning for Large Language Models
- Integer Programming for Generalized Causal Bootstrap Designs
- Integrating Intermediate Layer Optimization and Projected Gradient Descent for Solving Inverse Problems with Diffusion Models
- Integration-free kernels for equivariant Gaussian process modelling
- Interaction-Aware Gaussian Weighting for Clustered Federated Learning
- Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities
- Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence
- Interpolating Neural Network-Tensor Decomposition (INN-TD): a scalable and interpretable approach for large-scale physics-based problems
- Interpreting CLIP with Hierarchical Sparse Autoencoders
- Interpreting the repeated token phenomenon in LLMs
- Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces
- IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models
- Introducing 3D Representation for Medical Image Volume-to-Volume Translation via Score Fusion
- Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-Tuning
- Invariant Deep Uplift Modeling for Incentives Assignment in Online Marketing via Probability of Necessity and Sufficiency
- Inverse Bridge Matching Distillation
- Inverse Flow and Consistency Models
- Inverse Optimization via Learning Feasible Regions
- Inverse Problem Sampling in Latent Space Using Sequential Monte Carlo
- Inverse problems with experiment-guided AlphaFold
- Inverse Reinforcement Learning with Switching Rewards and History Dependency for Characterizing Animal Behaviors
- Investigating Non-Transitivity in LLM-as-a-Judge
- Investigating the Overlooked Hessian Structure: From CNNs to LLMs
- Invited Talk - Anca Dragan
- Invited Talk - Andreas Krause
- Invited Talk - Frauke Kreuter
- Invited Talk - Jon Kleinberg
- Invited Talk - Pamela Samuelson
- IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models
- Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment
- Is Complex Query Answering Really Complex?
- Is Noise Conditioning Necessary for Denoising Generative Models?
- Isolated Causal Effects of Natural Language
- Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs
- IT$^3$: Idempotent Test-Time Training
- IT-Bench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
- Iterative Refined Transformer for Curriculum Learning Guided De Novo Peptide Sequencing
- Iterative Vectors: In-Context Gradient Steering without Backpropagation
- ITFormer: Bridging Time Series and Natural Language for Multi-Modal QA with Large-Scale Multitask Dataset
- I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
- It's My Data Too: Private ML for Datasets with Multi-User Training Examples
- Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
- Jailbreaking LLMs and Agentic Systems: Attacks, Defenses, and Evaluations
- Janus: Dual-server Multi-Round Secure Aggregation with Verifiability for Federated Learning
- Joint Learning of Energy-based Models and their Partition Function
- Joint Localization and Activation Editing for Low-Resource Fine-Tuning
- Joint Metric Space Embedding by Unbalanced Optimal Transport with Gromov–Wasserstein Marginal Penalization
- Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient
- Joker: Joint Optimization Framework for Lightweight Kernel Machines
- Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning
- K$^2$IE: Kernel Method-based Kernel Intensity Estimators for Inhomogeneous Poisson Processes
- KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems
- KAN-AD: Time Series Anomaly Detection with Kolmogorov–Arnold Networks
- Kandinsky Conformal Prediction: Beyond Class- and Covariate-Conditional Coverage
- KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search
- KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies
- Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models
- KernelBench: Can LLMs Write Efficient GPU Kernels?
- Kernel Quantile Embeddings and Associated Probability Metrics
- KGMark: A Diffusion Watermark for Knowledge Graphs
- KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors
- KIND: Knowledge Integration and Diversion for Training Decomposable Models
- Kinetic Langevin Diffusion for Crystalline Materials Generation
- Knowledge-Guided Wasserstein Distributionally Robust Optimization
- Knowledge Retention in Continual Model-Based Reinforcement Learning
- Knowledge Swapping via Learning and Unlearning
- Kona: An Efficient Privacy-Preservation Framework for KNN Classification by Communication Optimization
- KoNODE: Koopman-Driven Neural Ordinary Differential Equations with Evolving Parameters for Time Series Analysis
- KoopSTD: Approximating Koopman Spectrum with Timescale Decoupling for Faithful Similarity Analysis between Dynamical Systems
- KV Shifting Attention Enhances Language Modeling
- KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
- L3A: Label-Augmented Analytic Adaptation for Multi-Label Class Incremental Learning
- Label Distribution Propagation-based Label Completion for Crowdsourcing
- LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
- LADA: Scalable Label-Specific CLIP Adapter for Continual Learning
- Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication Overlapping
- LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models
- LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation
- LangDAug: Langevin Data Augmentation for Multi-Source Domain Generalization in Medical Image Segmentation
- LangTime: A Language-Guided Unified Model for Time Series Forecasting with Proximal Policy Optimization
- Language Model as Implicit Tree Search
- Language Models May Verbatim Complete Text They Were Not Explicitly Trained On
- Language Models over Canonical Byte-Pair Encodings
- Laplace Transform Based Low-Complexity Learning of Continuous Markov Semigroups
- LapSum - One Method to Differentiate Them All: Ranking, Sorting and Top-k Selection
- LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs – No Silver Bullet for LC or RAG Routing
- Large Continual Instruction Assistant
- Large Displacement Motion Transfer with Unsupervised Anytime Interpolation
- Large Language-Geometry Model: When LLM meets Equivariance
- Large Language Model-driven Large Neighborhood Search for Large-Scale MILP Problems
- Large Language Models are Demonstration Pre-Selectors for Themselves
- Large Language Models to Diffusion Finetuning
- Larger or Smaller Reward Margins to Select Preferences for Alignment?
- LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
- La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation
- LASER: Attention with Exponential Transformation
- LAST SToP for Modeling Asynchronous Time Series
- Latent Action Learning Requires Supervision in the Presence of Distractors
- Latent Diffusion Planning for Imitation Learning
- Latent Imputation before Prediction: A New Computational Paradigm for De Novo Peptide Sequencing
- Latent Mamba Operator for Partial Differential Equations
- Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes
- Latent Score-Based Reweighting for Robust Classification on Imbalanced Tabular Data
- Latent Variable Causal Discovery under Selection Bias
- Latent Variable Estimation in Bayesian Black-Litterman Models
- LAuReL: Learned Augmented Residual Layer
- Layer by Layer: Uncovering Hidden Representations in Language Models
- Layer Caching for Accelerated Inference in Real-Time Rendering
- Layer-wise Alignment: Examining Safety Alignment Across Image Encoder Layers in Vision Language Models
- Layer-wise Quantization for Quantized Optimistic Dual Averaging
- LBI-FL: Low-Bit Integerized Federated Learning with Temporally Dynamic Bit-Width Allocation
- L-Diffusion: Laplace Diffusion for Efficient Pathology Image Segmentation
- LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space Space Surpass the AR Models
- Lean and Mean Adaptive Optimization via Subset-Norm and Subspace-Momentum with Convergence Guarantees
- LEAPS: A discrete neural sampler via locally equivariant networks
- Learnable Spatial-Temporal Positional Encoding for Link Prediction
- Learn Beneficial Noise as Graph Augmentation
- Learn from Downstream and Be Yourself in Multimodal Large Language Models Fine-Tuning
- Learngene Tells You How to Customize: Task-Aware Parameter Initialization at Flexible Scales
- Learning Adaptive Lighting via Channel-Aware Guidance
- Learning Adversarial MDPs with Stochastic Hard Constraints
- Learning Along the Arrow of Time: Hyperbolic Geometry for Backward-Compatible Representation Learning
- Learning Attribute-Aware Hash Codes for Fine-Grained Image Retrieval via Query Optimization
- Learning-augmented algorithms for MTS with bandit access to multiple predictors
- Learning-Augmented Hierarchical Clustering
- Learning Bayesian Nash Equilibrium in Auction Games via Approximate Best Response
- Learning Cascade Ranking as One Network
- Learning Changes in Graphon Attachment Network Models
- Learning Classifiers That Induce Markets
- Learning Compact Semantic Information for Incomplete Multi-View Missing Multi-Label Classification
- Learning Condensed Graph via Differentiable Atom Mapping for Reaction Yield Prediction
- Learning Configurations for Data-Driven Multi-Objective Optimization
- Learning Curves of Stochastic Gradient Descent in Kernel Regression
- Learning curves theory for hierarchically compositional data with power-law distributed features
- Learning Distances from Data with Normalizing Flows and Score Matching
- Learning Dynamics in Continual Pre-Training for Large Language Models
- Learning dynamics in linear recurrent neural networks
- Learning Dynamics under Environmental Constraints via Measurement-Induced Bundle Structures
- Learning Efficient Robotic Garment Manipulation with Standardization
- Learning Event Completeness for Weakly Supervised Video Anomaly Detection
- Learning Extrapolative Sequence Transformations from Markov Chains
- Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
- Learning from others' mistakes: Finetuning machine translation models with span-level error annotations
- Learning from Sample Stability for Deep Clustering
- Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network
- Learning from True-False Labels via Multi-modal Prompt Retrieving
- Learning Fused State Representations for Control from Multi-View Observations
- Learning Gaussian DAG Models without Condition Number Bounds
- Learning Global and Local Features in Pretrained Remote Sensing Models
- Learning Imbalanced Data with Beneficial Label Noise
- Learning Imperfect Information Extensive-form Games with Last-iterate Convergence under Bandit Feedback
- Learning In-context $n$-grams with Transformers: \\ Sub-$n$-grams Are Near Stationary Points
- Learning Initial Basis Selection for Linear Programming via Duality-Inspired Tripartite Graph Representation and Comprehensive Supervision
- Learning Input Encodings for Kernel-Optimal Implicit Neural Representations
- Learning Invariant Causal Mechanism from Vision-Language Models
- Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models
- Learning Latent Graph Structures and their Uncertainty
- Learning Likelihood-Free Reference Priors
- Learning Mean Field Control on Sparse Graphs
- Learning Minimum-Size BDDs: Towards Efficient Exact Algorithms
- Learning Mixtures of Experts with EM: A Mirror Descent Perspective
- Learning Monotonic Probabilities with a Generative Cost Model
- Learning Multi-Level Features with Matryoshka Sparse Autoencoders
- Learning multivariate Gaussians with imperfect advice
- Learning Optimal Multimodal Information Bottleneck Representations
- Learning-Order Autoregressive Models with Application to Molecular Graph Generation
- Learning Parametric Distributions from Samples and Preferences
- Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks
- Learning Progress Driven Multi-Agent Curriculum
- Learning Representations of Instruments for Partial Identification of Treatment Effects
- Learning Robust Neural Processes with Risk-Averse Stochastic Optimization
- Learning Safe Control via On-the-Fly Bandit Exploration
- Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions
- Learning Safety Constraints for Large Language Models
- Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
- Learning Single Index Models with Diffusion Priors
- Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction
- Learning Soft Sparse Shapes for Efficient Time-Series Classification
- Learning State-Based Node Representations from a Class Hierarchy for Fine-Grained Open-Set Detection
- Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization
- Learning Survival Distributions with the Asymmetric Laplace Distribution
- Learning the Electronic Hamiltonian of Large Atomic Structures
- Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
- Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains
- Learning Time-Varying Multi-Region Brain Communications via Scalable Markovian Gaussian Processes
- Learning to Generate 3D Molecules via Language Models with Geometry-Aware Tokenization
- Learning to Generate Projections for Reducing Dimensionality of Heterogeneous Linear Programming Problems
- Learning to Incentivize in Repeated Principal-Agent Problems with Adversarial Agent Arrivals
- Learning to Keep a Promise: Teaching LLMs an Annotation Language for Asynchronous Decoding
- Learning to (Learn at Test Time): RNNs with Expressive Hidden States
- Learning to Match Unpaired Data with Minimum Entropy Coupling
- Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
- Learning to Quantize for Training Vector-Quantized Networks
- Learning to Reuse Policies in State Evolvable Environments
- Learning to Route LLM with Confidence Tokens
- Learning to Steer Learners in Games
- Learning to Stop: Deep Learning for Mean Field Optimal Stopping
- Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL
- Learning Utilities from Demonstrations in Markov Decision Processes
- Learning with a Reference Model: From Generalization Theory to Scaling Law
- Learning with Exact Invariances in Polynomial Time
- Learning with Expected Signatures: Theory and Applications
- Learning With Multi-Group Guarantees For Clusterable Subpopulations
- Learning without Isolation: Pathway Protection for Continual Learning
- Learning with Selectively Labeled Data from Multiple Decision-makers
- Learn Sharp Interface Solution by Homotopy Dynamics
- Learn to Vaccinate: Combining Structure Learning and Effective Vaccination for Epidemic and Outbreak Control
- Learnware Specification via Dual Alignment
- Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data Streams
- LEMoN: Label Error Detection using Multimodal Neighbors
- LensLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
- Less is More: Federated Graph Learning with Alleviating Topology Heterogeneity from A Causal Perspective
- Let LLM Tell What to Prune and How Much to Prune
- LETS Forecast: Learning Embedology for Time Series Forecasting
- Leveraging Diffusion Model as Pseudo-Anomalous Graph Generator for Graph-Level Anomaly Detection
- Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models
- Leveraging Offline Data in Linear Latent Contextual Bandits
- Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation
- Leveraging Per-Instance Privacy for Machine Unlearning
- Leveraging Predictive Equivalence in Decision Trees
- Leveraging Randomness in Model and Data Partitioning for Privacy Amplification
- Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration
- Leveraging Sparsity for Sample-Efficient Preference Learning: A Theoretical Perspective
- LEVIS: Large Exact Verifiable Input Spaces for Neural Networks
- Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries
- LGDM: Latent Guidance in Diffusion Models for Perceptual Evaluations
- LieRE: Lie Rotational Positional Encodings
- LIFT-GS: Cross-Scene Render-Supervised Distillation for 3D Language Grounding
- LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Sparse Fine-Tuning
- Liger: Linearizing Large Language Models to Gated Recurrent Structures
- LightGTS: A Lightweight General Time Series Forecasting Model
- LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
- Lightspeed Geometric Dataset Distance via Sliced Optimal Transport
- Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty
- Lightweight-Mark: Rethinking Deep Learning-Based Watermarking
- Lightweight Online Adaption for Time Series Foundation Model Forecasts
- Lightweight Protocols for Distributed Private Quantile Estimation
- LIMEFLDL: A Local Interpretable Model-Agnostic Explanations Approach for Label Distribution Learning
- Limitations of measure-first protocols in quantum machine learning
- Linear $Q$-Learning Does Not Diverge: Convergence Rates to a Bounded Set
- Linear Bandits with Partially Observable Features
- Linear Contextual Bandits With Interference
- Linear convergence of Sinkhorn's algorithm for generalized static Schrödinger bridge
- Linearization Turns Neural Operators into Function-Valued Gaussian Processes
- Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting
- LineFlow: A framework to learn active control of production lines
- LipsNet++: Unifying Filter and Controller into a Policy Network
- LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces
- LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models
- LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification
- LLM Alignment as Retriever Optimization: An Information Retrieval Perspective
- LLM-Assisted Semantically Diverse Teammate Generation for Efficient Multi-agent Coordination
- LLM-Augmented Chemical Synthesis and Design Decision Programs
- LLM Data Selection and Utilization via Dynamic Bi-level Optimization
- LLM Enhancers for GNNs: An Analysis from the Perspective of Causal Mechanism Identification
- LLMScan: Causal Scan for LLM Misbehavior Detection
- LLMs Can Plan Faster Only If We Let Them
- LLMs can see and hear without any training
- LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws
- LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models
- LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations
- LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
- LOB-Bench: Benchmarking Generative AI for Finance - an Application to Limit Order Book Data
- Local Identifying Causal Relations in the Presence of Latent Variables
- Locality Preserving Markovian Transition for Instance Retrieval
- Local Manifold Approximation and Projection for Manifold-Aware Diffusion Planning
- Local Pan-privacy for Federated Analytics
- Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
- Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing
- Logarithmic Regret for Online KL-Regularized Reinforcement Learning
- Logits are All We Need to Adapt Closed Models
- LOGO --- Long cOntext aliGnment via efficient preference Optimization
- Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning
- Long-Context Inference with Retrieval-Augmented Speculative Decoding
- Long-Form Speech Generation with Spoken Language Models
- LongRoPE2: Near-Lossless LLM Context Window Scaling
- Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model
- LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
- Looking Beyond the Top-1: Transformers Determine Top Tokens in Order
- Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
- LoRA-Gen: Specializing Large Language Model via Online LoRA Generation
- LoRA Training Provably Converges to a Low-Rank Global Minimum Or It Fails Loudly (But it Probably Won't Fail)
- Loss Functions and Operators Generated by f-Divergences
- LotteryCodec: Searching the Implicit Representation in a Random Network for Low-Complexity Image Compression
- Low-Dimension-to-High-Dimension Generalization and Its Implications for Length Generalization
- Low-distortion and GPU-compatible Tree Embeddings in Hyperbolic Space
- Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
- LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits
- Low-Rank Adapting Models for Sparse Autoencoders
- Low-Rank Tensor Transitions (LoRT) for Transferable Tensor Regression
- Low-Rank Thinning
- LRA-QViT: Integrating Low-Rank Approximation and Quantization for Robust and Efficient Vision Transformers
- LSCD: Lomb--Scargle Conditioned Diffusion for Irregular Time series Imputation
- LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
- M³HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
- M3-Jepa: Multimodal Alignment via Multi-directional MoE based on the JEPA framework
- Machine Learning for Wireless Communication and Networks (ML4Wireless)
- Machine Learning meets Algebraic Combinatorics: A Suite of Datasets Capturing Research-level Conjecturing Ability in Pure Mathematics
- Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes
- Machine Unlearning for Generative AI
- MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces
- Mahalanobis++: Feature Normalization as the Missing Ingredient for Reliable OOD Detection
- Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
- Making AutoEncoders Diffusable Again
- Making Genomic Foundation Models more Foundational Requires Outlier Removal: A Case Study on DNABERT-2
- Making Hard Problems Easier with Custom Data Distributions and Loss Regularization: A Case Study in Modular Arithmetic
- MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
- MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
- MAPLE: Many-Shot Adaptive Pseudo-Labeling In-Context Learning
- MAPSparse: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention
- MARGE: Improving Math Reasoning with Guided Exploration
- MARS: Unleashing the Power of Variance Reduction for Training Large Models
- MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems
- Masked Autoencoders Are Effective Tokenizers for Diffusion Models
- Masked Generative Nested Transformers with Decode Time Scaling
- Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
- MaskTwins: Dual-form Complementary Masking for Domain-Adaptive Image Segmentation
- Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding
- MASS: Mathematical Data Selection via Skill Graphs for Pretraining Large Language Models
- Mastering Board Games by External and Internal Planning with Language Models
- Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
- Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer
- Matchmaker: Self-Improving Compositional LLM Programs for Schema Matching
- MathConstruct: Challenging LLM Reasoning with Constructive Proofs
- MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
- Matrix Completion with Incomplete Side Information via Orthogonal Complement Projection
- Matryoshka Quantization
- MATS: An Audio Language Model under Text-only Supervision
- Maximizing Intermediate Checkpoint Value in LLM Pretraining with Bayesian Optimization
- Maximum Coverage in Turnstile Streams with Applications to Fingerprinting Measures
- Maximum Entropy Reinforcement Learning with Diffusion Policy
- Maximum Total Correlation Reinforcement Learning
- Maximum Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural Operators
- MCU: An Evaluation Framework for Open-Ended Game Agents
- MDDM: Practical Message-Driven Generative Image Steganography Based on Diffusion Models
- Measuring Diversity: Axioms and Challenges
- Measuring Diversity in Synthetic Datasets
- Measuring In-Context Computation Complexity via Hidden State Prediction
- Measuring Representational Shifts in Continual Learning: A Linear Transformation Perspective
- Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs
- Measuring Variable Importance in Heterogeneous Treatment Effects with Confidence
- Mechanisms of Projective Composition of Diffusion Models
- Mechanistic PDE Networks for Discovery of Governing Equations
- Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
- MedRAX: Medical Reasoning Agent for Chest X-ray
- MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
- MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison
- MemFreezing: A Novel Adversarial Attack on Temporal Graph Neural Networks under Limited Future Knowledge
- Memory Layers at Scale
- MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning
- MERGE$^3$: Efficient Evolutionary Merging on Consumer-grade GPUs
- Merge-Friendly Post-Training Quantization for Multi-Target Domain Adaptation
- Merging Multiple Models under Permutation Symmetries
- MERIT: Maximum-normalized Element-wise Ratio for Language Model Large-batch Training
- MetaAgent: Automatically Building Multi-Agent System based on Finite State Machine
- Meta-Black-Box-Optimization through Offline Q-function Learning
- Metadata Conditioning Accelerates Language Model Pre-training
- Meta Optimality for Demographic Parity Constrained Regression via Post-Processing
- MetaOptimize: A Framework for Optimizing Step Sizes and Other Meta-parameters
- Meta-Reinforcement Learning with Human-in-the-Loop Adaptation via Preference-Order-Preserving Task Embedding
- Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation
- Methods and Opportunities at Small Scale (MOSS)
- MetricEmbedding: Accelerate Metric Nearness by Tropical Inner Product
- M+: Extending MemoryLLM with Scalable Long-Term Memory
- MF-LAL: Drug Compound Generation Using Multi-Fidelity Latent Space Active Learning
- MGD$^3$ : Mode-Guided Dataset Distillation using Diffusion Models
- MIB: A Mechanistic Interpretability Benchmark
- MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
- MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data
- MindCustomer: Multi-Context Image Generation Blended with Brain Signal
- MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-text Decoding
- Mind the Gap: A Practical Attack on GGUF Quantization
- Mind the Gap: a Spectral Analysis of Rank Collapse and Signal Propagation in Attention Layers
- Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
- Minerva: A Programmable Memory Test Benchmark for Language Models
- Minimalist Concept Erasure in Generative Models
- Minimax Optimal Regret Bound for Reinforcement Learning with Trajectory Feedback
- Minimum width for universal approximation using squashable activation functions
- MIPT: Multilevel Informed Prompt Tuning for Robust Molecular Property Prediction
- MiraGe: Editable 2D Images using Gaussian Splatting
- MIRROR: Make Your Multi-View Generation More Consistent with Training-Free Rectification
- Mirror, Mirror of the Flow: How Does Regularization Shape Implicit Bias?
- MissScore: High-Order Score Estimation in the Presence of Missing Data
- Mitigating Heterogeneous Token Overfitting in LLM Knowledge Editing
- Mitigating Local Cohesion and Global Sparseness in Graph Contrastive Learning with Fuzzy Boundaries
- Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance
- MITIGATING OVER-EXPLORATION IN LATENT SPACE OPTIMIZATION USING LES
- Mitigating Over-Squashing in Graph Neural Networks by Spectrum-Preserving Sparsification
- Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
- MixBridge: Heterogeneous Image-to-Image Backdoor Attack through Mixture of Schrödinger Bridges
- Mixed-curvature decision trees and random forests
- MixMin: Finding Data Mixtures via Convex Minimization
- Mixture of Experts Made Intrinsically Interpretable
- Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning
- Mixture of Hidden-Dimensions: Not All Hidden-States’ Dimensions are Needed in Transformer
- Mixture of Lookup Experts
- ML$^2$-GCL: Manifold Learning Inspired Lightweight Graph Contrastive Learning
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency
- MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
- MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
- Modalities Contribute Unequally: Enhancing Medical Multi-modal Learning through Adaptive Modality Token Re-balancing
- Model-Based Exploration in Monitored Markov Decision Processes
- Model Immunization from a Condition Number Perspective
- Modeling All-Atom Glycan Structures via Hierarchical Message Passing and Multi-Scale Pre-training
- Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent
- Models of Heavy-Tailed Mechanistic Universality
- Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
- Model Uncertainty Quantification in Continual Learning
- Modern Methods in Associative Memory
- Modified K-means Method with Local Optimality Guarantees
- Modular Duality in Deep Learning
- MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding
- Modulated Diffusion: Accelerating Generative Modeling with Modulated Quantization
- MODULI: Unlocking Preference Generalization via Diffusion Models for Offline Multi-Objective Reinforcement Learning
- MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
- MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition
- MOGIC: METADATA-INFUSED ORACLE GUIDANCE FOR IMPROVED EXTREME CLASSIFICATION
- MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition
- MoH: Multi-Head Attention as Mixture-of-Head Attention
- MoMa: Modulating Mamba for Adapting Image Foundation Models to Video Recognition
- Moment Matching Self-Distillation
- Momentum-Driven Adaptivity: Towards Tuning-Free Asynchronous Federated Learning
- MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
- Monte Carlo Tree Diffusion for System 2 Planning
- Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design
- Monte-Carlo Tree Search with Uncertainty Propagation via Optimal Transport
- MoRAgent: Parameter Efficient Agent Tuning with Mixture-of-Roles
- More Than Meets the Eye: Enhancing Multi-Object Tracking Even with Prolonged Occlusions
- Morse: Fast Sampling for Accelerating Diffusion Models Universally
- MP-Nav: Enhancing Data Poisoning Attacks against Multimodal Learning
- MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment
- MSR-ViR: Modularized Self-Reflected Video Reasoner for Multimodal LLM with Application to Video Question Answering
- MTL-UE: Learning to Learn Nothing for Multi-Task Learning
- MTSTRec: Multimodal Time-Aligned Shared Token Recommender
- MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost
- Multiaccuracy and Multicalibration via Proxy Groups
- Multi-agent Architecture Search via Agentic Supernet
- Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures
- Multi-armed Bandits with Interference
- Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
- Multi-Domain Graph Foundation Models: Robust Knowledge Transfer via Topology Alignment
- Multilayer Matrix Factorization via Dimension-Reducing Diffusion Variational Inference
- Multi-Marginal Stochastic Flow Matching for High-Dimensional Snapshot Data at Irregular Time Points
- Multimodal Medical Code Tokenizer
- Multi-Modal Object Re-identification via Sparse Mixture-of-Experts
- Multinoulli Extension: A Lossless Yet Effective Probabilistic Framework for Subset Selection over Partition Constraints
- Multi-Objective Causal Bayesian Optimization
- Multiobjective distribution matching
- Multi-objective Linear Reinforcement Learning with Lexicographic Rewards
- MultiPDENet: PDE-embedded Learning with Multi-time-stepping for Accelerated Flow Simulation
- Multiple-policy Evaluation via Density Estimation
- Multi-Session Budget Optimization for Forward Auction-based Federated Learning
- Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning
- Multi-Timescale Dynamics Model Bayesian Optimization for Plasma Stabilization in Tokamaks
- Multi-token prediction boosts creativity in open-ended algorithmic tasks
- Multi-Turn Code Generation Through Single-Step Rewards
- Multivariate Conformal Selection
- Multi-View Graph Clustering via Node-Guided Contrastive Encoding
- MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners
- Mutual Learning for SAM Adaptation: A Dual Collaborative Network Framework for Source-Free Domain Transfer
- MVA: Linear Attention with High-order Query-Keys Integration and Multi-level Vocabulary Decomposition
- MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
- N2GON: Neural Networks for Graph-of-Net with Position Awareness
- Natural Perturbations for Black-box Training of Neural Networks by Zeroth-Order Optimization
- Navigating Conflicting Views: Harnessing Trust For Learning
- Navigating Semantic Drift in Task-Agnostic Class-Incremental Learning
- Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration
- Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning
- NBDI: A Simple and Effective Termination Condition for Skill Extraction from Task-Agnostic Demonstrations
- Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
- Nearly Optimal Sample Complexity for Learning with Label Proportions
- NEAR: Neural Electromagnetic Array Response
- Near Optimal Best Arm Identification for Clustered Bandits
- Near-Optimal Consistency-Robustness Trade-Offs for Learning-Augmented Online Knapsack Problems
- Near-Optimal Decision Trees in a SPLIT Second
- Near Optimal Non-asymptotic Sample Complexity of 1-Identification
- Near-optimal Regret Using Policy Optimization in Online MDPs with Aggregate Bandit Feedback
- Near-Optimal Sample Complexity for MDPs via Anchoring
- Near-optimal Sketchy Natural Gradients for Physics-Informed Neural Networks
- NegMerge: Sign-Consensual Weight Merging for Machine Unlearning
- Neighbour-Driven Gaussian Process Variational Autoencoders for Scalable Structured Latent Modelling
- Nested Expectations with Kernel Quadrature
- Nesterov Method for Asynchronous Pipeline Parallel Optimization
- NestQuant: nested lattice quantization for matrix product and LLMs
- NETS: A Non-equilibrium Transport Sampler
- Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning
- NeuralCohort: Cohort-aware Neural Representation Learning for Healthcare Analytics
- Neural Collapse Beyond the Unconstrainted Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime
- Neural Combinatorial Optimization via Preference Optimization
- Neural Discovery in Mathematics: Do Machines Dream of Colored Planes?
- Neural Encoding and Decoding at Scale
- Neural Event-Triggered Control with Optimal Scheduling
- Neural Genetic Search in Discrete Spaces
- Neural Graph Matching Improves Retrieval Augmented Generation in Small Molecule Mass Spectrum Prediction
- Neural Graph Pattern Machine
- Neural Guided Diffusion Bridges
- Neural Interpretable PDEs: Harmonizing Fourier Insights with Attention for Scalable and Interpretable Physics Discovery
- Neural Representational Consistency Emerges from Probabilistic Neural-Behavioral Representation Alignment
- Neural Solver Selection for Combinatorial Optimization
- NeuronTune: Towards Self-Guided Spurious Bias Mitigation
- Neurosymbolic World Models for Sequential Decision Making
- NeuroTree: An Interpretable High-Order GCN for Learning Brain Pathways in Psychiatric Disorders
- Neutral residues: revisiting adapters for model extension
- New Bounds for Sparse Variational Gaussian Processes
- NextCoder: Robust Adaptation of Code LMs to Diverse Code Edits
- NExtLong: Toward Effective Long-Context Training without Long Documents
- NICE: Non-differentiable Evaluation Metric-based Data Selection for Instruction Tuning
- NMA-tune: Generating Highly Designable and Dynamics Aware Protein Backbones
- No Free Lunch from Random Feature Ensembles
- Noise Conditional Variational Score Distillation
- Noise-Guided Predicate Representation Extraction and Diffusion-Enhanced Discretization for Scene Graph Generation
- Noisy SIGNSGD Is More Differentially Private Than You (Might) Think
- NoLiMa: Long-Context Evaluation Beyond Literal Matching
- No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets
- Non-Asymptotic and Non-Lipschitzian Bounds on Optimal Values in Stochastic Optimization Under Heavy Tails
- Non-Asymptotic Length Generalization
- Nonconvex Theory of $M$-estimators with Decomposable Regularizers
- Nonlinearly Preconditioned Gradient Methods under Generalized Smoothness
- Nonlinear transformers can perform inference-time feature learning: a case study of in-context learning on single-index models
- Nonparametric Identification of Latent Concepts: a Structural View
- Nonparametric Modern Hopfield Models
- Nonparametric Teaching for Graph Property Learners
- Non-stationary Diffusion For Probabilistic Time Series Forecasting
- Non-stationary Online Learning for Curved Losses: Improved Dynamic Regret via Mixability
- Non-Stationary Predictions May Be More Informative: Exploring Pseudo-Labels with a Two-Phase Pattern of Training Dynamics
- No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization
- Normalizing Flows are Capable Generative Models
- No Soundness in the Real World: On the Challenges of the Verification of Deployed Neural Networks
- Not all solutions are created equal: An analytical dissociation of functional and representational similarity in deep linear neural networks
- Not All Tokens Matter All The Time: Dynamic Token Aggregation Towards Efficient Detection Transformers
- Not All Wrong is Bad: Using Adversarial Examples for Unlearning
- No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces
- Novelty Detection in Reinforcement Learning with World Models
- NTK-DFL: Enhancing Decentralized Federated Learning in Heterogeneous Settings via Neural Tangent Kernel
- Objective drives the consistency of representational similarity across datasets
- Observation Interference in Partially Observable Assistance Games
- Occult: Optimizing Collaborative Communications across Experts for Accelerated Parallel MoE Training and Inference
- OCN: Learning Object-centric Representations for Unsupervised Multi-object Segmentation
- Offline Learning for Combinatorial Multi-armed Bandits
- Offline Model-based Optimization for Real-World Molecular Discovery
- Offline Opponent Modeling with Truncated Q-driven Instant Policy Refinement
- Offline-to-Online Reinforcement Learning with Classifier-Free Diffusion Generation
- Off-Policy Evaluation under Nonignorable Missing Data
- Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents
- Olica: Efficient Structured Pruning of Large Language Models without Retraining
- O-MAPLE: Offline Multi-agent Preference Learning
- OmiAD: One-Step Adaptive Masked Diffusion Model for Multi-class Anomaly Detection via Adversarial Distillation
- Omni-Angle Assault: An Invisible and Powerful Physical Adversarial Attack on Face Recognition
- OmniArch: Building the Foundation Model for Scientific Computing
- OMNIBAL: TOWARDS FAST INSTRUCT-TUNING FOR VISION-LANGUAGE MODELS VIA OMNIVERSE COMPUTATION BALANCE
- On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists
- On Differential Privacy for Adaptively Solving Search Problems via Sketching
- One Arrow, Two Hawks: Sharpness-aware Minimization for Federated Learning via Global Model Trajectory
- One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation
- One-dimensional Path Convolution
- One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs
- On Efficient Estimation of Distributional Treatment Effects under Covariate-Adaptive Randomization
- OneForecast: A Universal Framework for Global and Regional Weather Forecasting
- One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework for Diffusion Models
- One Leaf Reveals the Season: Occlusion-Based Contrastive Learning with Semantic-Aware Views for Efficient Visual Representation
- One-Pass Feature Evolvable Learning with Theoretical Guarantees
- One-Shot Heterogeneous Federated Learning with Local Model-Guided Diffusion Models
- One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation
- One-step full gradient suffices for low-rank fine-tuning, provably and efficiently
- One-Step Generalization Ratio Guided Optimization for Domain Generalization
- One Wave To Explain Them All: A Unifying Perspective On Feature Attribution
- On Exact Bit-level Reversible Transformers Without Changing Architectures
- On Explaining Equivariant Graph Networks via Improved Relevance Propagation
- On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
- On Fine-Grained Distinct Element Estimation
- On Learning Parallel Pancakes with Mostly Uniform Weights
- On Linear Convergence in Smooth Convex-Concave Bilinearly-Coupled Saddle-Point Optimization: Lower Bounds and Optimal Algorithms
- Online Clustering of Dueling Bandits
- Online Conformal Prediction via Online Optimization
- Online Curvature-Aware Replay: Leveraging $\mathbf{2^{nd}}$ Order Information for Online Continual Learning
- Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting
- Online differential private conformal prediction for uncertainty quantification
- Online Episodic Convex Reinforcement Learning
- Online Laplacian-Based Representation Learning in Reinforcement Learning
- Online Learning in Risk Sensitive constrained MDP
- Online Learning in the Random-Order Model
- Online Learning with Unknown Constraints
- Online Linear Classification with Massart Noise
- Online Pre-Training for Offline-to-Online Reinforcement Learning
- Online Robust Reinforcement Learning Through Monte-Carlo Planning
- On Measuring Long-Range Interactions in Graph Neural Networks
- On Mitigating Affinity Bias through Bandits with Evolving Biased Feedback
- On Path to Multimodal Generalist: Levels and Benchmarks
- On Teacher Hacking in Language Model Distillation
- On Temperature Scaling and Conformal Prediction of Deep Classifiers
- On the Adversarial Robustness of Multi-Kernel Clustering
- On the Alignment between Fairness and Accuracy: from the Perspective of Adversarial Robustness
- On the Benefits of Active Data Collection in Operator Learning
- On the Clean Generalization and Robust Overfitting in Adversarial Training from Two Theoretical Views: Representation Complexity and Training Dynamics
- On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning
- On the Convergence of Continuous Single-timescale Actor-critic
- On the Diversity of Adversarial Ensemble Learning
- On the Duality between Gradient Transformations and Adapters
- On the Dynamic Regret of Following the Regularized Leader: Optimism with History Pruning
- On the Emergence of Position Bias in Transformers
- On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention for Long-Context LLM Serving
- On the Generalization Ability of Next-Token-Prediction Pretraining
- On the Guidance of Flow Matching
- On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training
- On the Impact of Performative Risk Minimization for Binary Random Variables
- On the Importance of Embedding Norms in Self-Supervised Learning
- On the Importance of Gaussianizing Representations
- On the Interplay between Graph Structure and Learning Algorithms in Graph Neural Networks
- On the Learnability of Distribution Classes with Adaptive Adversaries
- On the Local Complexity of Linear Regions in Deep ReLU Networks
- On the Out-of-Distribution Generalization of Self-Supervised Learning
- On the Power of Context-Enhanced Learning in LLMs
- On the Power of Learning-Augmented Search Trees
- On the Private Estimation of Smooth Transport Maps
- On the Provable Separation of Scales in Maximal Update Parameterization
- On the Query Complexity of Verifier-Assisted Language Generation
- On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents
- On the Robustness of Reward Models for Language Model Alignment
- On the Role of Label Noise in the Feature Learning Process
- On the Similarities of Embeddings in Contrastive Learning
- On the Statistic Mechanisms of Distributional Compositional Generalization
- On the Tension between Byzantine Robustness and No-Attack Accuracy in Distributed Learning
- On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures
- On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains
- On Understanding Attention-Based In-Context Learning for Categorical Data
- On Volume Minimization in Conformal Regression
- On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation
- OOD-Chameleon: Is Algorithm Selection for OOD Generalization Learnable?
- Open-Det: An Efficient Learning Framework for Open-Ended Detection
- Open Materials Generation with Stochastic Interpolants
- Open the Eyes of MPNN: Vision Enhances MPNN in Link Prediction
- OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning
- Optimal Algorithm for Max-Min Fair Bandit
- Optimal Auction Design in the Joint Advertising
- Optimal Decision Tree Pruning Revisited: Algorithms and Complexity
- Optimal Error Bounds in $\mathcal{W}_2$-Distance with Sqrt(d) Dimension Dependence for Langevin Monte Carlo beyond Log-Concavity
- Optimal Fair Learning Robust to Adversarial Distribution Shift
- Optimal Information Retention for Time-Series Explanations
- Optimal Sensor Scheduling for Continuous-Discrete Kalman Filtering with Auxiliary Dynamics
- Optimal Survey Design for Private Mean Estimation
- Optimal Task Order for Continual Learning of Multiple Tasks
- Optimal Transfer Learning for Missing Not-at-Random Matrix Completion
- Optimal Transport Barycenter via Nonconvex-Concave Minimax Optimization
- Optimal transport-based conformal prediction
- Optimistic Algorithms for Adaptive Estimation of the Average Treatment Effect
- Optimization for Neural Operators can Benefit from Width
- Optimization over Sparse Support-Preserving Sets: Two-Step Projection with Global Convergence Guarantees
- Optimization Proxies using Limited Labeled Data and Training Time -- A Semi-Supervised Bayesian Neural Network Approach
- Optimizing Adaptive Attacks against Watermarks for Language Models
- Optimizing Language Models for Inference Time Objectives using Reinforcement Learning
- Optimizing Large Language Model Training Using FP4 Quantization
- Optimizing Multi-Agent Reasoning through Incomplete Information and Consensus
- Optimizing Noise Distributions for Differential Privacy
- Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach
- Optimizing Social Network Interventions via Hypergradient-Based Recommender System Design
- Optimizing Temperature for Language Models with Multi-Sample Inference
- Optimizing Test-Time Compute via Meta Reinforcement Finetuning
- OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling
- Oracle-MoE: Locality-preserving Routing in the Oracle Space for Memory-constrained Large Lanuge Model Inference
- OR-Bench: An Over-Refusal Benchmark for Large Language Models
- OrcaLoca: An LLM Agent Framework for Software Issue Localization
- Orient Anything
- Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
- Origin Identification for Text-Guided Image-to-Image Diffusion Models
- Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection
- OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference
- Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
- Oscillation-Reduced MXFP4 Training for Vision Transformers
- OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
- Otter: Generating Tests from Issues to Validate SWE Patches
- Outlier-Aware Post-training Quantization for Discrete Graph Diffusion Models
- Outlier Gradient Analysis: Efficiently Identifying Detrimental Training Samples for Deep Learning Models
- Output Alignment: A Fresh Perspective on Length Generalization in LLMs
- Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models
- Overcoming Fake Solutions in Semi-Dual Neural Optimal Transport: A Smoothing Approach for Learning the Optimal Transport Plan
- Overcoming Multi-step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner
- Overcoming Non-monotonicity in Transducer-based Streaming Generation
- Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization
- Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling
- Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination’s Impact on Machine Translation
- Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
- Overtrained Language Models Are Harder to Fine-Tune
- OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition
- OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models
- OW-VAP: Visual Attribute Parsing for Open World Object Detection
- PAC-Bayes Analysis for Recalibration in Classification
- PAC Learning with Improvements
- Pairwise Maximum Likelihood For Multi-Class Logistic Regression Model With Multiple Rare Classes
- P(all-atom) Is Unlocking New Path For Protein Design
- PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling
- PaperBench: Evaluating AI’s Ability to Replicate AI Research
- ParallelComp: Parallel Long-Context Compressor for Length Extrapolation
- Parallel Simulation for Sampling under Isoperimetry and Score-based Diffusion Models
- Parameter-Efficient Fine-Tuning of State Space Models
- Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
- Parametric Scaling Law of Tuning Bias in Conformal Prediction
- Pareto-frontier Entropy Search with Variational Lower Bound Maximization
- Pareto Merging: Multi-Objective Optimization for Preference-Aware Model Merging
- Pareto-Optimal Fronts for Benchmarking Symbolic Regression Algorithms
- Pareto-Optimality, Smoothness, and Stochasticity in Learning-Augmented One-Max-Search
- PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model
- PARQ: Piecewise-Affine Regularized Quantization
- Parrot: Multilingual Visual Instruction Tuning
- Partially Observable Reinforcement Learning with Memory Traces
- Partition First, Embed Later: Laplacian-Based Feature Partitioning for Refined Embedding and Visualization of High-Dimensional Data
- PASS: Private Attributes Protection with Stochastic Data Substitution
- PatchPilot: A Stable and Cost-Efficient Agentic Patching Framework
- Patch-wise Structural Loss for Time Series Forecasting
- PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs
- PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
- PDUDT: Provable Decentralized Unlearning under Dynamic Topologies
- PEAKS: Selecting Key Training Examples Incrementally via Prediction Error Anchored by Kernel Similarity
- PEINR: A Physics-enhanced Implicit Neural Representation for High-Fidelity Flow Field Reconstruction
- Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data
- PENCIL: Long Thoughts with Short Memory
- PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion
- Perception in Reflection
- Perceptual-GS: Scene-Adaptive Perceptual Densification for Gaussian Splatting
- Perceptually Constrained Precipitation Nowcasting Model
- Peri-LN: Revisiting Layer Normalization in the Transformer Architecture
- Peripheral Memory for LLMs: Integration of Sequential Memory Banks with Adaptive Querying
- Permutation-based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data
- Permutation Equivariant Neural Networks for Symmetric Tensors
- Permutation-Free High-Order Interaction Tests
- Persistent Topological Features in Large Language Models
- PertEval-scFM: Benchmarking Single-Cell Foundation Models for Perturbation Effect Prediction
- Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning
- PF3plat: Learning Pose-Free Feed-Forward 3D Gaussian Splatting for Novel View Synthesis
- Pfeife: Automatic Pipeline Parallelism for PyTorch
- PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
- Phase and Amplitude-aware Prompting for Enhancing Adversarial Robustness
- Phase transitions for the existence of unregularized M-estimators in single index models
- Physics Aware Neural Networks for Unsupervised Binding Energy Prediction
- Physics-Informed DeepONets for drift-diffusion on metric graphs: simulation and parameter identification
- Physics-Informed Generative Modeling of Wireless Channels
- Physics-informed Temporal Alignment for Auto-regressive PDE Foundation Models
- PHYSICS-INFORMED WEAKLY SUPERVISED LEARNING FOR INTERATOMIC POTENTIALS
- PhySpec: Physically Consistent Spectral Reconstruction via Orthogonal Subspace Decomposition and Self-Supervised Meta-Auxiliary Learning
- PICI: Efficient Position-Independent Context Caching for Serving Large Language Models
- PiD: Generalized AI-Generated Images Detection with Pixelwise Decomposition Residuals
- PieClam: A Universal Graph Autoencoder Based on Overlapping Inclusive and Exclusive Communities
- PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning
- PILAF: Optimal Human Preference Sampling for Reward Modeling
- Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule
- PINNsAgent: Automated PDE Surrogation with Large Language Models
- PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
- PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models
- Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models
- Pixel2Feature Attack (P2FA): Rethinking the Perturbed Space to Enhance Adversarial Transferability
- Pixel-level Certified Explanations via Randomized Smoothing
- Plan-and-Act: Structured Execution of LLM Agents for Long-Horizon Web Tasks
- Plausible Token Amplification for Improving Accuracy of Differentially Private In-Context Learning Based on Implicit Bayesian Inference
- Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion
- PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning
- Point Cloud Dataset Distillation
- Point-Level Topological Representation Learning on Point Clouds
- Pointwise Information Measures as Confidence Estimators in Deep Neural Networks: A Comparative Study
- PoisonBench: Assessing Large Language Model Vulnerability to Poisoned Preference Data
- PoisonedEye: Knowledge Poisoning Attack on Retrieval-Augmented Generation based Large Vision-Language Models
- PokeChamp: an Expert-level Minimax Language Agent for Competitive Pokemon
- Policy Design for Two-sided Platforms with Participation Dynamics
- Policy Filtration for RLHF to Mitigate Noise in Reward Models
- Policy Gradient with Tree Expansion
- Policy Guided Tree Search for Enhanced LLM Reasoning
- Policy-labeled Preference Learning: Is Preference Enough for RLHF?
- Policy Optimization for CMDPs with Bandit Feedback: Learning Stochastic and Adversarial Constraints
- Policy Regret Minimization in Markov Games with Function Approximation
- Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning
- Poly2Vec: Polymorphic Fourier-Based Encoding of Geospatial Objects for GeoAI Applications
- polybasic Speculative Decoding Through a Theoretical Perspective
- PolyConf: Unlocking Polymer Conformation Generation through Hierarchical Generative Models
- Polynomial-Time Approximability of Constrained Reinforcement Learning
- Polynomial Time Learning Augmented Algorithms for NP-hard Permutation Problems
- POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval
- POROver: Improving Safety and Reducing Overrefusal in Large Language Models with Overgeneration and Preference Optimization
- Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models
- Position: A Critical Perspective on The Value in Studying Deep Learning Phenomena
- Position: AI Agents Need Authenticated Delegation
- Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation
- Position: AI Evaluation Should Learn from How We Test Humans
- Position: AI Safety Must Embrace an Antifragile Perspective
- Position: AI Safety should prioritize the Future of Work
- Position: AI Scaling: From Up to Down and Out
- Position: AI's growing due process problem
- Position: AI Should Not Be An Imitation Game: Centaur Evaluations
- Positional Attention: Expressivity and Learnability of Algorithmic Computation
- Positional encoding meets persistent homology on graphs
- Position: Algebra Unveils Deep Learning - An Invitation to Neuroalgebraic Geometry
- Position: All Current Generative Fidelity and Diversity Metrics are Flawed
- Position: An Empirically Grounded Identifiability Theory Will Accelerate Self Supervised Learning Research
- Position: A Theory of Deep Learning Must Include Compositional Sparsity
- Position: Beyond Assistance – Reimagining LLMs as Ethical and Adaptive Co-Creators in Mental Health Care
- Position: Build Agent Advocates, Not Platform Agents
- Position: Causal Machine Learning Requires Rigorous Synthetic Experiments for Broader Adoption
- Position: Certified Robustness Does Not (Yet) Imply Model Security
- Position: Challenges and Future Directions of Data-Centric AI Alignment
- Position: Constants are Critical in Regret Bounds for Reinforcement Learning
- Position: Contextual Integrity is Inadequately Applied to Language Models
- Position: Current Model Licensing Practices are Dragging Us into a Quagmire of Legal Noncompliance
- Position: Deep Learning is Not So Mysterious or Different
- Position: Democratic AI is Possible. The Democracy Levels Framework Shows How It Might Work.
- Position: Don't Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints
- Position: Editing Large Language Models Poses Serious Safety Risks
- Position: Enough of Scaling LLMs! Lets Focus on Downscaling
- Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge
- Position: Explainable AI Cannot Advance Without Better User Studies
- Position: Formal Mathematical Reasoning—A New Frontier in AI
- Position: Future Research and Challenges Remain Towards AI for Software Engineering
- Position: General Intelligence Requires Reward-based Pretraining
- Position: Generative AI Regulation Can Learn from Social Media Regulation
- Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks
- Position: Graph Matching Systems Deserve Better Benchmarks
- Position: Human Baselines in Model Evaluations Need Rigor and Transparency (With Recommendations & Reporting Checklist)
- Position: Humanity Faces Existential Risk from Gradual Disempowerment
- Position: In-House Evaluation Is Not Enough. Towards Robust Third-Party Evaluation and Flaw Disclosure for General-Purpose AI
- Position: Iterative Online-Offline Joint Optimization is Needed to Manage Complex LLM Copyright Risks
- Position: It Is Time We Test Neural Computation In Vitro
- Position: Language model developers should report train-test overlap
- Position: Lifetime tuning is incompatible with continual reinforcement learning
- Position: LLMs Need a Bayesian Meta-Reasoning Framework for More Robust and Generalizable Reasoning
- Position: LLM Social Simulations Are a Promising Research Method
- Position: Machine Learning Models Have a Supply Chain Problem
- Position: Medical Large Language Model Benchmarks Should Prioritize Construct Validity
- Position: Political Neutrality in AI Is Impossible — But Here Is How to Approximate It
- Position: Principles of Animal Cognition to Improve LLM Evaluations
- Position: Probabilistic Modelling is Sufficient for Causal Inference
- Position: Rethinking Explainable Machine Learning as Applied Statistics
- Position: Rethinking LLM Bias Probing Using Lessons from the Social Sciences
- Position: Retrieval-augmented systems can be dangerous medical communicators
- Position: Scaling LLM Agents Requires Asymptotic Analysis with LLM Primitives
- Position: Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
- Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
- Position: Spectral GNNs Rely Less on Graph Fourier Basis than Conceived
- Position: Stop treating `AGI' as the north-star goal of AI research
- Position: Strong Consumer Protection is an Inalienable Defense for AI Safety in the United States
- Position: Supervised Classifiers Answer the Wrong Questions for OOD Detection
- Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards
- Position: The Artificial Intelligence and Machine Learning Community Should Adopt a More Transparent and Regulated Peer Review Process
- Position: The Categorization of Race in ML is a Flawed Premise
- Position: The Future of Bayesian Prediction Is Prior-Fitted
- Position: The Most Expensive Part of an LLM *should* be its Training Data
- Position: Theory of Mind Benchmarks are Broken for Large Language Models
- Position: The Right to AI
- Position: Truly Self-Improving Agents Require Intrinsic Metacognitive Learning
- Position: Trustworthy AI Agents Require the Integration of Large Language Models and Formal Methods
- Position: Uncertainty Quantification Needs Reassessment for Large Language Model Agents
- Position: We Can’t Understand AI Using our Existing Vocabulary
- Position: We Need An Algorithmic Understanding of Generative AI
- Position: We Need Responsible, Application-Driven (RAD) AI Research
- Position: When Incentives Backfire, Data Stops Being Human
- Position: You Can't Manufacture a NeRF
- Positive-unlabeled AUC Maximization under Covariate Shift
- Posterior Inference with Diffusion Models for High-dimensional Black-box Optimization
- Potemkin Understanding in Large Language Models: Formalizing and Benchmarking Conceptual Comprehension
- Power Mean Estimation in Stochastic Continuous Monte-Carlo Tree Search
- PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design
- Practical and Near-Optimal Algorithm for Batched Linear Bandits
- Preconditioned Riemannian Gradient Descent Algorithm for Low-Multilinear-Rank Tensor Completion
- Predicting Drug-likeness via Biomedical Knowledge Alignment and EM-like One-Class Boundary Optimization
- Predicting High-precision Depth on Low-Precision Devices Using 2D Hilbert Curves
- Predicting the Susceptibility of Examples to Catastrophic Forgetting
- Prediction-aware Learning in Multi-Agent Systems
- Prediction models that learn to avoid missing values
- Prediction-Powered Adaptive Shrinkage Estimation
- Prediction-Powered E-Values
- Prediction via Shapley Value Regression
- Predictive Consistency Learning with Gradual Label Modeling
- Predictive Data Selection: The Data That Predicts Is the Data That Teaches
- Predictive Performance of Deep Quantum Data Re-uploading Models
- Preference Adaptive and Sequential Text-to-Image Generation
- Preference-CFR: Beyond Nash Equilibrium for Better Game Strategies
- Preference Controllable Reinforcement Learning with Advanced Multi-Objective Optimization
- Preference Learning for AI Alignment: a Causal Perspective
- Preference learning made easy: Everything should be understood through win rate
- Preference Optimization for Combinatorial Optimization Problems
- Pre-Memorization Train Accuracy Reliably Predicts Generalization in LLM Reasoning
- Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs
- Preserving AUC Fairness in Learning with Noisy Protected Groups
- Pre-Trained Vision-Language Model Selection and Reuse for Downstream Tasks
- Pre-training Auto-regressive Robotic Models with 4D Representations
- Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation
- Pre-Training Graph Contrastive Masked Autoencoders are Strong Distillers for EEG
- Prices, Bids, Values: One ML-Powered Combinatorial Auction to Rule Them All
- Primal-Dual Neural Algorithmic Reasoning
- PRIME: Deep Imbalanced Regression with Proxies
- Primitive Vision: Knowing Where to Look is Key in Math Diagram Understanding
- Primphormer: Efficient Graph Transformers with Primal Representations
- Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents
- Principled Algorithms for Optimizing Generalized Metrics in Binary Classification
- Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples
- Prior-Informed Preference Alignment: as Plain Statistical Estimation
- Prior Knowledge Guided Neural Architecture Generation
- Privacy Amplification by Structured Subsampling for Deep Differentially Private Time Series Forecasting
- Privacy Amplification Through Synthetic Data: Insights from Linear Regression
- Privacy Attacks on Image AutoRegressive Models
- Privacy-Preserving Federated Convex Optimization: Balancing Partial Participation and Efficiency via Noise Cancellation
- Privacy-Preserving Large Language Model Inference via GPU-Accelerated Fully Homomorphic Encryption
- Privacy-Shielded Image Compression: Defending Against Exploitation from Vision-Language Pretrained Models
- Private Federated Learning using Preference-Optimized Synthetic Data
- Private Lossless Multiple Release
- Private Model Personalization Revisited
- Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty
- Probabilistic Factorial Experimental Design for Combinatorial Interventions
- Probabilistic Group Mask Guided Discrete Optimization for Incremental Learning
- Probabilistic Interactive 3D Segmentation with Hierarchical Neural Processes
- Probabilistic Verification of Neural Networks using Branch and Bound
- Probably Approximately Global Robustness Certification
- Probing Visual Language Priors in VLMs
- Procurement Auctions via Approximately Optimal Submodular Optimization
- ProDiff: Prototype-Guided Diffusion for Minimal Information Trajectory Imputation
- Programmatic Representations for Agent Learning
- Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
- Progressively Label Enhancement for Large Language Model Alignment
- Progressive Tempering Sampling with Diffusion
- Projection Optimization: A General Framework for Multi-Objective and Multi-Group RLHF
- Projection Pursuit Density Ratio Estimation
- Promoting Ensemble Diversity with Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models
- Prompt-based Depth Pruning of Large Language Models
- Prompt-to-Leaderboard: Prompt-Adaptive Language Model Evaluations with Neural Coefficient Models
- Propagate and Inject: Revisiting Propagation-Based Feature Imputation for Graphs with Partially Observed Features
- Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble
- Properties of Wasserstein gradient flows for the Sliced-Wasserstein distance
- Proportional Multiwinner Voting with Dynamic Candidate Sets
- Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents
- ProSec: Fortifying Code LLMs with Proactive Security Alignment
- Protein Structure Tokenization: Benchmarking and New Recipe
- PROTOCOL: Partial Optimal Transport-enhanced Contrastive Learning for Imbalanced Multi-view Clustering
- Proto Successor Measure: Representing the Behavior Space of an RL Agent
- Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction
- Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent
- Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent
- Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models
- Provable In-Context Vector Arithmetic via Retrieving Task Concepts
- Provable Length Generalization in Sequence Prediction via Spectral Filtering
- Provable Maximum Entropy Manifold Exploration via Diffusion Models
- Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity
- Provable Zero-Shot Generalization in Offline Reinforcement Learning
- Provably Cost-Sensitive Adversarial Defense via Randomized Smoothing
- Provably Efficient Algorithm for Best Scoring Rule Identification in Online Information Acquisition
- Provably Efficient Exploration in Inverse Constrained Reinforcement Learning
- Provably Efficient RL for Linear MDPs under Instantaneous Safety Constraints in Non-Convex Feature Spaces
- Provably Improving Generalization of Few-shot models with Synthetic Data
- Provably Near-Optimal Federated Ensemble Distillation with Negligible Overhead
- PROXSPARSE: REGULARIZED LEARNING OF SEMI-STRUCTURED SPARSITY MASKS FOR PRETRAINED LLMS
- Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting
- Prune 'n Predict: Optimizing LLM Decision-making with Conformal Prediction
- PRUNING For GNNs: HIGHER EXPRESSIVENESS, LOWER COMPLEXITY
- PTTA: Purifying Malicious Samples for Test-Time Model Adaptation
- Putnam-AXIOM: A Functional & Static Benchmark for Measuring Higher Level Mathematical Reasoning in LLMs
- Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
- PyTDC: A multimodal machine learning training, evaluation, and inference platform for biomedical foundation models
- QEM-Bench: Benchmarking Learning-based Quantum Error Mitigation and QEMFormer as a Multi-ranged Context Learning Baseline
- QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
- QMamba: On First Exploration of Vision Mamba for Image Quality Assessment
- QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
- QPRL : Learning Optimal Policies with Quasi-Potential Functions for Asymmetric Traversal
- Q-Supervised Contrastive Representation: A State Decoupling Framework for Safe Offline Reinforcement Learning
- QT-DoG: Quantization-Aware Training for Domain Generalization
- Quadratic Differentiable Optimization for the Maximum Independent Set Problem
- Quadratic Upper Bound for Boosting Robustness
- Quadruple Attention in Many-body Systems for Accurate Molecular Property Predictions
- Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
- QuanONet: Quantum Neural Operator with Application to Differential Equation
- Quantifying Memory Utilization with Effective State-Size
- Quantifying perturbation impacts for large language models
- Quantifying Prediction Stability Under Fine-tuning Multiplicity in Tabular LLMs
- Quantifying Treatment Effects: Estimating Risk Ratios via Observational Studies
- QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
- Quantum Algorithms for Finite-horizon Markov Decision Processes
- Quantum Optimization via Gradient-Based Hamiltonian Descent
- Quantum Speedup for Hypergraph Sparsification
- Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes
- QuEst: Enhancing Estimates of Quantile-Based Distributional Measures Using Model Predictions
- QuEST: Quantized Gradient Estimation for Accurate Training of Extremely Low-Bitwith Large Language Models
- QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval
- QUTE: Quantifying Uncertainty in TinyML models with Early-exit-assisted ensembles for model-monitoring
- Q-VDiT: Towards Accurate Quantization of Video-Generation Diffusion Transformers
- R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
- R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning
- Radio: Rate–Distortion Optimization for Large Language Model Compression
- RAGGED: Towards Informed Design of Scalable and Stable RAG Systems
- Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
- Random Feature Representation Boosting
- Randomized Dimensionality Reduction for Euclidean Maximization and Diversity Measures
- Random Policy Evaluation Uncovers Policies of Generative Flow Networks
- Random Registers for Cross-Domain Few-Shot Learning
- Ranked Entropy Minimization for Continual Test-Time Adaptation
- Ranked from Within: Ranking Large Multimodal Models Without Labels
- Ranking with Multiple Oracles: From Weak to Strong Stochastic Transitivity
- RankNovo: A Universal Reranking Approach for Robust De Novo Peptide Sequencing
- Rank-One Modified Value Iteration
- Rapid Overfitting of Multi-Pass SGD in Stochastic Convex Optimization
- Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models
- RATE: Causal Explainability of Reward Models with Imperfect Counterfactuals
- Rate Constrained Optimized Training of Large Language Models
- R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
- Reaction Graph: Towards Reaction-Level Modeling for Chemical Reactions with 3D Structures
- RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning
- Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment
- Reasoning Limitations of Multimodal Large Language Models. A case study of Bongard Problems
- Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
- RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents against Human Experts
- Recommendations with Sparse Comparison Data: Provably Fast Convergence for Nonconvex Matrix Factorization
- Reconstructing cell lineage trees from phenotypic features with metric learning
- Rectifying Conformity Scores for Better Conditional Coverage
- RedCrowd: Adaptive Security for LLMs
- Reducing Confounding Bias without Data Splitting for Causal Inference via Optimal Transport
- Reducing Tool Hallucination via Reliability Alignment
- Reducing Variance of Stochastic Optimization for Approximating Nash Equilibria in Normal-Form Games
- Redundancy Undermines the Trustworthiness of Self-Interpretable GNNs
- ReferSplat: Referring Segmentation in 3D Gaussian Splatting
- R*: Efficient Reward Design via Reward Structure Evolution and Parameter Alignment Optimization with Large Language Models
- Refined generalization analysis of the Deep Ritz Method and Physics-Informed Neural Networks
- Refining Adaptive Zeroth-Order Optimization at Ease
- Reflection-Bench: Evaluating Agency in Large Language Models
- Reflection-Window Decoding: Text Generation with Selective Refinement
- Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens
- ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
- REG: Rectified Gradient Guidance for Conditional Diffusion Models
- Regress, Don't Guess: A Regression-like Loss on Number Tokens for Language Models
- Regression for the Mean: Auto-Evaluation and Inference with Few Labels through Post-hoc Regression
- Regret-Free Reinforcement Learning for Temporal Logic Specifications
- Regularized Langevin Dynamics for Combinatorial Optimization
- Reidentify: Context-Aware Identity Generation for Contextual Multi-Agent Reinforcement Learning
- RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation
- ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
- REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
- Reinforced Learning Explicit Circuit Representations for Quantum State Characterization from Local Measurements
- Reinforced Lifelong Editing for Language Models
- Reinforce LLM Reasoning with Multi-Agent Reflection
- Reinforcement Learning Control of a Physical Robot Device for Assisted Human Level Ground Walking without a Simulator
- Reinforcement Learning for Quantum Control under Physical Constraints
- Reinforcement Learning with Adaptive Reward Modeling for Expensive-to-Evaluate Systems
- Reinforcement learning with random time horizons
- Reinforcement Learning with Segment Feedback
- Rejecting Hallucinated State Targets during Planning
- Relating Misfit to Gain in Weak-to-Strong Generalization Beyond the Squared Loss
- Relational Conformal Prediction for Correlated Time Series
- Relational Invariant Learning for Robust Solvation Free Energy Prediction
- Relative Error Fair Clustering in the Weak-Strong Oracle Model
- Relaxing the Equivariance Constraint for 3D Molecule Generation
- RelGNN: Composite Message Passing for Relational Deep Learning
- Reliable Algorithm Selection for Machine Learning-Guided Design
- Reliable and Efficient Amortized Model-based Evaluation
- Rényi Neural Processes
- RePaViT: Scalable Vision Transformer Acceleration via Structural Reparameterization on Feedforward Network Layers
- RepLoRA: Reparameterizing Low-rank Adaptation via the Perspective of Mixture of Experts
- RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing
- Representation Preserving Multiclass Agnostic to Realizable Reduction
- Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
- Representations Shape Weak-to-Strong Generalization: Theoretical Insights and Empirical Predictions
- Representation Surgery in Model Merging with Probabilistic Modeling
- Representative Language Generation
- Representative Ranking for Deliberation in the Public Sphere
- ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation
- Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger
- ResearchTown: Simulator of Human Research Community
- Residual Matrix Transformers: Scaling the Size of the Residual Stream
- Residual TPP: A unified lightweight approach for event stream data analysis
- ResKoopNet: Learning Koopman Representations for Complex Dynamics with Spectral Residuals
- Resolving Lexical Bias in Model Editing
- ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals
- RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior
- Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
- Rethink GraphODE Generalization within Coupled Dynamical System
- Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding
- Rethinking Aleatoric and Epistemic Uncertainty
- Rethinking Benign Overfitting in Two-Layer Neural Networks
- Rethinking Causal Ranking: A Balanced Perspective on Uplift Model Evaluation
- Rethinking Chain-of-Thought from the Perspective of Self-Training
- Rethinking Confidence and Thresholds in Pseudolabeling-based SSL
- Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning
- Rethinking Latent Representations in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation
- Rethinking Point Cloud Data Augmentation: Topologically Consistent Deformation
- Rethinking Prompt Design Space: In-Context Optimization through Evolutionary Self-replication
- Rethinking Score Distilling Sampling for 3D Editing and Generation
- Rethinking the Bias of Foundation Model under Long-tailed Distribution
- Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective
- Rethinking the Temperature for Federated Heterogeneous Distillation
- Rethinking Time Encoding via Learnable Transformation Functions
- Rethink the Role of Deep Learning towards Large-scale Quantum Systems
- Retraining-free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering
- Retraining with Predicted Hard Labels Provably Increases Model Accuracy
- Retrieval-Augmented Language Model for Knowledge-aware Protein Encoding
- Retrieval-Augmented Perception: High-resolution Image Perception Meets Visual RAG
- Retrieval Augmented Time Series Forecasting
- Retrieval Augmented Zero-Shot Enzyme Generation for Specified Substrate
- Return Capping: Sample Efficient CVaR Policy Gradient Optimisation
- Return of the Latent Space COWBOYS: Re-thinking the use of VAEs for Bayesian Optimisation of Structured Spaces
- Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks
- ReverB-SNN: Reversing Bit of the Weight and Activation for Spiking Neural Networks
- ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification
- Revisiting Chain-of-Thought in Code Generation: Do Language Models Need to Learn Reasoning before Coding?
- Revisiting Continuity of Image Tokens for Cross-domain Few-shot Learning
- Revisiting Convergence: A Study on Shuffling-Type Gradient Methods
- Revisiting Cooperative Off-Policy Multi-Agent Reinforcement Learning
- Revisiting Differentially Private Algorithms for Decentralized Online Learning
- Revisiting Diffusion Models: From Generative Pre-training to One-Step Generation
- Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model
- Revisiting Neural Networks for Few-Shot Learning: A Zero-Cost NAS Perspective
- Revisiting Noise Resilience Strategies in Gesture Recognition: Short-Term Enhancement in sEMG Analysis
- Revisiting Non-Acyclic GFlowNets in Discrete Environments
- Revisiting the Predictability of Social Events
- Revisiting Unbiased Implicit Variational Inference
- Reviving the Cooperation Dynamics in Multimodal Transformer
- Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization
- Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
- Reward-free World Models for Online Imitation Learning
- Reward-Guided Refinement in Diffusion Models With Applications to Protein and DNA Design
- Reward-Guided Speculative Decoding for Efficient LLM Reasoning
- Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
- Reward Translation via Reward Machine in Semi-Alignable MDPs
- Reweighting Local Mimina with Tilted SAM
- Rhomboid Tiling clustering for geometric graph deep learning
- Ridgelet Transform and Unified Universality Theorem for Deep and Shallow Joint-Group-Equivariant Machines
- Riemannian Diffusion Adaptation for Distributed Optimization on Manifolds
- Riemann Tensor Neural Networks: Learning Conservative Systems with Physics-Constrained Networks
- Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
- Right Time to Learn: Promoting Generalization via Bio-inspired Spacing Effect in Knowledge Distillation
- Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity
- R.I.P.: Better Models by Survival of the Fittest Prompts
- RISE: Radius of Influence based Subgraph Extraction for 3D Molecular Graph Explanation
- Risk and cross validation in ridge regression with correlated samples
- Risk-Sensitive Theory of Mind: Coordinating with Agents of Unknown Bias using Cumulative Prospect Theory
- RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
- Robot-Gated Shared Autonomy with Adaptive Intervention Mechanism
- Robust and Conjugate Spatio-Temporal Gaussian Processes
- Robust Automatic Modulation Classification with Fuzzy Regularization
- Robust Autonomy Emerges from Self-Play
- Robust Conformal Outlier Detection under Contaminated Reference Data
- Robust Consensus Anchor Learning for Efficient Multi-view Subspace Clustering
- RobustLight: Improving Robustness via Diffusion Reinforcement Learning for Traffic Signal Control
- Robust ML Auditing using Prior Knowledge
- Robust Multi-Agent Reinforcement Learning with Stochastic Adversary
- Robust Multi-bit Text Watermark with LLM-based Paraphrasers
- Robust Multimodal Large Language Models Against Modality Conflicts
- Robust Noise Attenuation via Adaptive Pooling of Transformer Outputs
- Robust Offline Reinforcement Learning with Linearly Structured f-Divergence Regularization
- Robust Off-Policy Actor-Critic: Virtual Alternative Training via Symmetric Policy Evaluation
- Robust Reward Alignment via Hypothesis Space Batch Cutting
- Robust Secure Swap: Responsible Face Swap With Persons of Interest Redaction and Provenance Traceability
- Robust Sparsification via Sensitivity
- Robust Spatio-Temporal Centralized Interaction for OOD Learning
- RobustZero: Enhancing MuZero Reinforcement Learning Robustness to State Perturbations
- RocketKV: Accelerating Long-Context LLM Inference via Two-stage KV Cache Compression
- ROME is Forged in Adversity: Robust Distilled Datasets via Information Bottleneck
- ROPO: Robust Preference Optimization for Large Language Models
- ROS: A GNN-based Relax-Optimize-and-Sample Framework for Max-$k$-Cut Problems
- RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models
- rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
- RuleAdapter: Dynamic Rules for training Safety Reward Models in RLHF
- RULEBREAKERS: Challenging Large Language Models at the Crossroads between Formal Logic and Human-like Reasoning
- RUN: Reversible Unfolding Network for Concealed Object Segmentation
- Runtime Analysis of Evolutionary NAS for Multiclass Classification
- RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Uniform and Vector Quantization
- RZ-NAS: Enhancing LLM-guided Neural Architecture Search via Reflective Zero-Cost Strategy
- S2-Track: A Simple yet Strong Approach for End-to-End 3D Multi-Object Tracking
- S4S: Solving for a Fast Diffusion Model Solver
- Sable: a Performant, Efficient and Scalable Sequence Model for MARL
- SADA: Stability-guided Adaptive Diffusion Acceleration
- SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
- SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
- SAE-V: Interpreting Multimodal Models for Enhanced Alignment
- SafeArena: Evaluating the Safety of Autonomous Web Agents
- SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
- Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
- Safe-EF: Error Feedback for Non-smooth Constrained Optimization
- SAFE: Finding Sparse and Flat Minima to Improve Pruning
- Safely Learning Optimal Auctions: A Testable Learning Framework for Mechanism Design
- SafeMap: Robust HD Map Construction from Incomplete Observations
- SAFER: A Calibrated Risk-Aware Tabular-Language Recommendation Model for Dynamic Treatment Regimes
- Safety Alignment Can Be Not Superficial With Explicit Safety Signals
- SafetyAnalyst: Interpretable, transparent, and steerable safety moderation for AI behavior
- Safety Certificate against Latent Variables with Partially Unidentifiable Dynamics
- Safety-Polarized and Prioritized Reinforcement Learning
- SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization
- SAH-Drive: A Scenario-Aware Hybrid Planner for Closed-Loop Vehicle Trajectory Generation
- SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
- Sample complexity of branch-length estimation by maximum likelihood
- Sample Complexity of Correlation Detection in the Gaussian Wigner Model
- Sample Complexity of Distributionally Robust Off-Dynamics Reinforcement Learning with Online Interaction
- Sample Efficient Demonstration Selection for In-Context Learning
- Sample-Optimal Agnostic Boosting with Unlabeled Data
- Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification
- Sample-specific Noise Injection for Diffusion-based Adversarial Purification
- Sampling Binary Data by Denoising through Score Functions
- Sampling from Binary Quadratic Distributions via Stochastic Localization
- SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
- SAND: One-Shot Feature Selection with Additive Noise Distortion
- SAN: Hypothesizing Long-Term Synaptic Development and Neural Engram Mechanism in Scalable Model's Parameter-Efficient Fine-Tuning
- Sanity Checking Causal Representation Learning on a Simple Real-World System
- Sargy: Targeted Human Feedback for LLM Alignment
- Sassha: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
- Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
- SBGD: Improving Graph Diffusion Generative Model via Stochastic Block Diffusion
- Scaffold with Stochastic Gradients: New Analysis with Linear Speed-Up
- Scalable Approximation Algorithms for $p$-Wasserstein Distance and Its Variants
- Scalable Attribute-Missing Graph Clustering via Neighborhood Differentiation
- Scalable Equilibrium Sampling with Sequential Boltzmann Generators
- Scalable First-order Method for Certifying Optimal k-Sparse GLMs
- Scalable Gaussian Processes with Latent Kronecker Structure
- Scalable Generation of Spatial Transcriptomics from Histology Images via Whole-Slide Flow Matching
- Scalable Language Models with Posterior Inference of Latent Thought Vectors
- Scalable Meta-Learning via Mixed-Mode Differentiation
- Scalable Model Merging with Progressive Layer-wise Distillation
- Scalable Private Partition Selection via Adaptive Weighting
- Scalable Reinforcement Post-Training Beyond Static Human Prompts
- Scalable Sobolev IPM for Probability Measures on a Graph
- Scalably Solving Assistance Games
- Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks
- Scaling Inference-Efficient Language Models
- Scaling Large Motion Models with Million-Level Human Motions
- Scaling Laws for Differentially Private Language Models
- Scaling Laws for Floating–Point Quantization Training
- Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection
- Scaling Laws for Pre-training Agents and World Models
- Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream
- Scaling Laws for Upcycling Mixture-of-Experts Language Models
- Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
- Scaling Probabilistic Circuits via Monarch Matrices
- Scaling sparse feature circuit finding for in-context learning
- Scaling Test-Time Compute Without Verification or RL is Suboptimal
- Scaling Trends in Language Model Robustness
- Scaling Up Intervention Models
- Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning
- Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
- SCENIR: Visual Semantic Clarity through Unsupervised Scene Graph Retrieval
- SCENT: Robust Spatiotemporal Learning for Continuous Scientific Data via Scalable Conditioned Neural Fields
- Schwarz–Schur Involution and Dirichlet-to-Neumann Condensing: Partial Differential Equations and Sparse Linear Systems Solved 1000x Faster
- sciLaMA: A Single-Cell Representation Learning Framework to Leverage Prior Knowledge from Large Language Models
- SCISSOR: Mitigating Semantic Bias through Cluster-Aware Siamese Networks for Robust Classification
- Score as Action: Fine Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
- Score-Based Diffusion Policy Compatible with Reinforcement Learning via Optimal Transport
- Score-based Pullback Riemannian Geometry: Extracting the Data Manifold Geometry using Anisotropic Flows
- Score Matching with Missing Data
- ScoreMix: One-Step Generative Model Training Made Simple via Score Estimation of Mixture Distributions
- scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data
- SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations
- SDMG: Smoothing Your Diffusion Models for Powerful Graph Representation Learning
- SDP-CROWN: Efficient Bound Propagation for Neural Network Verification with Tightness of Semidefinite Programming
- SE(3)-Equivariant Diffusion Policy in Spherical Fourier Space
- SEAD: Unsupervised Ensemble of Streaming Anomaly Detectors
- Secant Line Search for Frank-Wolfe Algorithms
- SecEmb: Sparsity-Aware Secure Federated Learning of On-Device Recommender System with Large Embedding
- SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding
- Securing Equal Share: A Principled Approach for Learning Multiplayer Symmetric Games
- SeedLoRA: A Fusion Approach to Efficient LLM Fine-Tuning
- SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
- Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation
- Selective Preference Aggregation
- Selective Prompt Anchoring for Code Generation
- Selective Response Strategies for GenAI
- Self-Bootstrapping for Versatile Test-Time Adaptation
- SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
- Self-Consistency Preference Optimization
- Self-Consuming Generative Models with Adversarially Curated Data
- Self-cross Feature based Spiking Neural Networks for Efficient Few-shot Learning
- Self-Discriminative Modeling for Anomalous Graph Detection
- Self-Disentanglement and Re-Composition for Cross-Domain Few-Shot Segmentation
- Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI
- Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
- Self-Organizing Visual Prototypes for Non-Parametric Representation Learning
- Self-supervised Adversarial Purification for Graph Neural Networks
- Self-supervised Heterogeneous Graph Neural Network with Optimal Transport
- Self-Supervised Learning of Intertwined Content and Positional Features for Object Detection
- Self-supervised Masked Graph Autoencoder via Structure-aware Curriculum
- Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction
- Semantics-aware Test-time Adaptation for 3D Human Pose Estimation
- Semantic Shift Estimation via Dual-Projection and Classifier Reconstruction for Exemplar-Free Class-Incremental Learning
- Semi-Supervised Blind Quality Assessment with Confidence-quantifiable Pseudo-label Learning for Authentic Images
- SEMU: Singular Value Decomposition for Efficient Machine Unlearning
- SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models
- SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
- SERENA: A Unified Stochastic Recursive Variance Reduced Gradient Framework for Riemannian Non-Convex Optimization
- Settling the Maximin Share Fairness for Scheduling among Groups of Machines
- Set Valued Predictions For Robust Domain Generalization
- SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
- SGD Jittering: A Training Strategy for Robust and Accurate Model-Based Architectures
- ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
- SHARP-Distill: A 68× Faster Recommender System with Hypergraph Neural Networks and Language Models
- Sharper Stability and Generalization Analysis of Decentralized SGD
- Sharp Generalization for Nonparametric Regression by Over-Parameterized Neural Networks: A Distribution-Free Analysis in Spherical Covariate
- Sharp Optimality of Simple, Plug-in Estimation of the Fisher Information of a Smoothed Density
- SHE: Streaming-media Hashing Retrieval
- ShieldAgent: Shielding LLM Agents via Verifiable Safety Policy Reasoning
- Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency
- SHIELD: Multi-task Multi-distribution Vehicle Routing Solver with Sparsity & Hierarchy in Efficiently Layered Decoder
- Shifting time: Time-series forecasting with Khatri-Rao neural operators
- Shortcut-connected Expert Parallelism for Accelerating Mixture of Experts
- Should Decision-Makers Reveal the Classifiers in Online Strategic Classification?
- Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN
- Signed Laplacians for Constrained Graph Clustering
- SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
- Simple Path Structural Encoding for Graph Transformers
- Simple Policy Optimization
- Simple Randomized Rounding for Max-Min Eigenvalue Augmentation
- Simplicity Bias and Optimization Threshold in Two-Layer Networks
- Simplifying DINO by Coding Rate Regularization
- Simultaneous Multi-Robot Motion Planning with Projected Diffusion Models
- Since Faithfulness Fails: The Performance Limits of Neural Causal Discovery
- SING: Spatial Context in Large Language Model for Next-Gen Wearables
- SITCOM: Step-wise Triple-Consistent Diffusion Sampling For Inverse Problems
- SketchDNN: Joint Continuous-Discrete Diffusion for CAD Sketch Generation
- Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation
- SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization
- SkipGPT: Each Token is One of a Kind
- Skip the Equations: Learning Behavior of Personalized Dynamical Systems Directly From Data
- SKOLR: Structured Koopman Operator Linear RNN for Time-Series Forecasting
- Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation
- SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs
- Sleeping Reinforcement Learning
- Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning
- SlimLLM: Accurate Structured Pruning for Large Language Models
- SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
- Slimming the Fat-Tail: MoF for Adaptive Time Series Modeling
- SLIM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression
- SMART-PC: Skeletal Model Adaptation for Robust Test-Time Training in Point Clouds
- Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences
- Smooth Interpolation for Improved Discrete Graph Generative Models
- SNS-Bench: Defining, Building, and Assessing Capabilities of Large Language Models in Social Networking Services
- Socialized Coevolution: Advancing a Better World through Cross-Task Collaboration
- Soft Diffusion Actor-Critic: Efficient Online Reinforcement Learning for Diffusion Policy
- softmax is not enough (for sharp size generalisation)
- SOLD: Slot Object-Centric Latent Dynamics Models for Relational Manipulation Learning from Pixels
- Solving Linear-Gaussian Bayesian Inverse Problems with Decoupled Diffusion Sequential Monte Carlo
- Solving Satisfiability Modulo Counting Exactly with Probabilistic Circuits
- Solving Zero-Sum Convex Markov Games
- SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
- Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model
- Sort Before You Prune: Improved Worst-Case Guarantees of the DiskANN Family of Graphs
- Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems
- Sounding that Object: Interactive Object-Aware Image to Audio Generation
- Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging
- SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model
- SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference
- Sparse Autoencoders, Again?
- Sparse Autoencoders for Hypothesis Generation
- Sparse Causal Discovery with Generative Intervention for Unsupervised Graph Domain Adaptation
- SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
- Sparse-pivot: Dynamic correlation clustering for node insertions
- Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks
- Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry
- Sparse Video-Gen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
- SparseVLM: Visual Token Sparsification for Efficient Vision Language Models Inference
- Spatial Reasoning with Denoising Models
- SPD: Smoothed Primal-Dual Methods for Nonconvex Optimization with Equilibrium Constraints
- SPD: Sync-Point Drop for efficient tensor parallelism of Large Language Models
- Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
- SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs
- Specialization-generalization transition in exemplar-based in-context learning
- Spectral-Aware Reservoir Computing for Fast and Accurate Time Series Classification
- Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding
- Speculative Ensemble: Fast Large Language Model Ensemble via Speculation
- Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation
- Speeding up Policy Simulation in Supply Chain RL
- SPEX: Scaling Feature Interaction Explanations for LLMs
- Spherical-Nested Diffusion Model for Panoramic Image Outpainting
- Spherical Rotation Dimension Reduction with Geometric Loss Functions
- SPHINX: Structural Prediction using Hypergraph Inference Network
- SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity
- SpikF: Spiking Fourier Network for Efficient Long-term Prediction
- Splitting & Integrating: Out-of-Distribution Detection via Adversarial Gradient Attribution
- Splitting with Importance-aware Updating for Heterogeneous Federated Learning with Large Language Models
- SPMC: Self-Purifying Federated Backdoor Defense via Margin Contribution
- SPRI: Aligning Large Language Models with Context-Situated Principles
- Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization
- Square$\chi$PO: Differentially Private and Robust $\chi^2$-Preference Optimization in Offline Direct Alignment
- SSHR: More Secure Generative Steganography with High-Quality Revealed Secret Images
- StaB-ddG: Predicting mutational effects on protein binding from folding energy
- Stability and Generalization Capability of Subgraph Reasoning Models for Inductive Knowledge Graph Completion
- Stabilizing Sample Similarity in Representation via Mitigating Random Consistency
- Stable Fair Graph Representation Learning with Lipschitz Constraint
- Stable Offline Value Function Learning with Bisimulation-based Representations
- Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization
- Staged and Physics-Grounded Learning Framework with Hyperintensity Prior for Pre-Contrast MRI Synthesis
- STAIR: Improving Safety Alignment with Introspective Reasoning
- STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings
- Star Attention: Efficient LLM Inference over Long Sequences
- STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization
- Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances
- Statistical Collusion by Collectives on Learning Platforms
- Statistical Query Hardness of Multiclass Linear Classification with Random Classification Noise
- Statistical Test for Feature Selection Pipelines by Selective Inference
- Stay Hungry, Keep Learning: Sustainable Plasticity for Deep Reinforcement Learning
- Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detection
- STD-FD: Spatio-Temporal Distribution Fitting Deviation for AIGC Forgery Identification
- Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning
- Stealix: Model Stealing via Prompt Evolution
- StealthInk: A Multi-bit and Stealthy Watermark for Large Language Models
- Steerable Transformers for Volumetric Data
- Steering Protein Language Models
- Step-DAD: Semi-Amortized Policy-Based Bayesian Experimental Design
- Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence
- Stochastic Deep Restoration Priors for Imaging Inverse Problems
- Stochastic Encodings for Active Feature Acquistion
- Stochastic Forward–Backward Deconvolution: Training Diffusion Models with Finite Noisy Datasets
- Stochastic Layer-Wise Shuffle for Improving Vision Mamba Training
- Stochastic Online Conformal Prediction with Semi-Bandit Feedback
- Stochastic Poisson Surface Reconstruction with One Solve using Geometric Gaussian Processes
- Stochastic Smoothed Primal-Dual Algorithms for Nonconvex Optimization with Linear Inequality Constraints
- SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics
- Strategic A/B testing via Maximum Probability-driven Two-armed Bandit
- Strategic Planning: A Top-Down Approach to Option Generation
- Strategy Coopetition Explains the Transience of Emergent In-Context Learning
- Stray Intrusive Outliers-Based Feature Selection on Intra-Class Asymmetric Instance Distribution or Multiple High-Density Clusters
- Stream-level Flow Matching with Gaussian Processes
- Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
- Strengthen Out-of-Distribution Detection Capability with Progressive Self-Knowledge Distillation
- Strong and Weak Identifiability of Optimization-based Causal Discovery
- Stronger Neyman Regret Guarantees for Adaptive Experimental Design
- Structured Preconditioners in Adaptive Optimization: A Unified Analysis
- Structure-Guided Large Language Models for Text-to-SQL Generation
- Structure-informed Risk Minimization for Robust Ensemble Learning
- Structure Is All You Need: Structural Representation Learning on Hyper-Relational Knowledge Graphs
- Subgoal-Guided Policy Heuristic Search with Learned Subgoals
- Subgroups Matter for Robust Bias Mitigation
- Subobject-level Image Tokenization
- Sub-Sequential Physics-Informed Learning with State Space Model
- Subspace Optimization for Large Language Models with Convergence Guarantees
- SUICA: Learning Super-high Dimensional Sparse Implicit Neural Representations for Spatial Transcriptomics
- Suitability Filter: A Statistical Framework for Model Evaluation in Real-World Deployment Settings
- Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups
- Sundial: A Family of Highly Capable Time Series Foundation Models
- Super Deep Contrastive Information Bottleneck for Multi-modal Clustering
- Supervised Contrastive Learning from Weakly-Labeled Audio Segments for Musical Version Matching
- Surrogate Prompt Learning: Towards Efficient and Diverse Prompt Learning for Vision-Language Models
- Survival analysis via density estimation
- SWAN: SGD with Normalization and Whitening Enables Stateless LLM Training
- SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
- SWIFTCODE: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning
- Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales
- Symmetry-Aware GFlowNets
- Symmetry-Driven Discovery of Dynamical Variables in Molecular Simulations
- SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering
- SynEVO: A neuro-inspired spatiotemporal evolutional framework for cross-domain adaptation
- Synonymous Variational Inference for Perceptual Image Compression
- Synthesizing Images on Perceptual Boundaries of ANNs for Uncovering and Manipulating Human Perceptual Variability
- Synthesizing Privacy-Preserving Text Data via Finetuning *without* Finetuning Billion-Scale LLMs
- Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion
- Synthetic Text Generation for Training Large Language Models via Gradient Matching
- System-Aware Unlearning Algorithms: Use Lesser, Forget Faster
- TabFlex: Scaling Tabular Learning to Millions with Linear Attention
- TabFSBench: Tabular Benchmark for Feature Shifts in Open Environment
- TabICL: A Tabular Foundation Model for In-Context Learning on Large Data
- TabPFN Unleashed: A Scalable and Effective Solution to Tabular Classification Problems
- TabSDS: a lightweight, fully non-parametric, and model free approach for generating synthetic tabular data
- Tackling Dimensional Collapse toward Comprehensive Universal Domain Adaptation
- Tackling View-Dependent Semantics in 3D Language Gaussian Splatting
- Taming Diffusion for Dataset Distillation with High Representativeness
- Taming Knowledge Conflicts in Language Models
- Taming Rectified Flow for Inversion and Editing
- TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization
- Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion
- Targeted control of fast prototyping through domain-specific interface
- Targeted Low-rank Refinement: Enhancing Sparse Language Model with Precision
- Targeted Unlearning with Single Layer Unlearning Gradient
- TAROT: Targeted Data Selection via Optimal Transport
- Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner
- Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks
- Task-Gated Multi-Expert Collaboration Network for Degraded Multi-Modal Image Fusion
- Task Generalization With AutoRegressive Compositional Structure
- TCP-Diffusion: A Multi-modal Diffusion Model for Global Tropical Cyclone Precipitation Forecasting with Change Awareness
- Teaching Language Models to Critique via Reinforcement Learning
- Teaching Physical Awareness to LLMs through Sounds
- Teaching Transformers Causal Reasoning through Axiomatic Training
- TeDS: Joint Learning of Diachronic and Synchronic Perspectives in Quaternion Space for Temporal Knowledge Graph Completion
- Telling Peer Direct Effects from Indirect Effects in Observational Network Data
- TeLoGraF: Temporal Logic Planning via Graph-encoded Flow Matching
- Temperature-Annealed Boltzmann Generators
- Temporal Difference Flows
- Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning
- Temporal Misalignment in ANN-SNN Conversion and Its Mitigation via Probabilistic Spiking Neurons
- Temporal Query Network for Efficient Multivariate Time Series Forecasting
- Tensor Decomposition Based Memory-Efficient Incremental Learning
- Tensorized Multi-View Multi-Label Classification via Laplace Tensor Rank
- Tensor Product Neural Networks for Functional ANOVA Model
- Tensor-Var: Efficient Four-Dimensional Variational Data Assimilation
- TerraBytes: Towards global datasets and models for Earth Observation
- Testing Conditional Mean Independence Using Generative Neural Networks
- Testing the limits of fine-tuning to improve reasoning in vision language models
- Test-Time Adaptation for Online Vision-Language Navigation with Feedback-based Reinforcement Learning
- Test-time Adaptation on Graphs via Adaptive Subgraph-based Selection and Regularized Prototypes
- Test-Time Adaptation with Binary Feedback
- Test-time Adapted Reinforcement Learning with Action Entropy Regularization
- Test-time Correlation Alignment
- Test-Time Graph Neural Dataset Search With Generative Projection
- Test-Time Learning for Large Language Models
- Test-Time Multimodal Backdoor Detection by Contrastive Prompting
- Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
- Test-Time Selective Adaptation for Uni-Modal Distribution Shift in Multi-Modal Data
- Test-Time Training Provably Improves Transformers as In-context Learners
- TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation
- Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models
- Text-to-LoRA: Instant Transformer Adaption
- Textual Unlearning Gives a False Sense of Unlearning
- Textural or Textual: How Vision-Language Models Read Text in Images
- TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
- The 1st Workshop on Vector Databases
- The 2nd Workshop on Reliable and Responsible Foundation Models
- The Batch Complexity of Bandit Pure Exploration
- The Best of Both Worlds: Bridging Quality and Diversity in Data Selection with Bipartite Graph
- The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
- The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions
- The Canary’s Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text
- The Case for Learned Provenance-based System Behavior Baseline
- The Complexity of Learning Sparse Superposed Features with Feedback
- The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning
- The dark side of the forces: assessing non-conservative force models for atomistic machine learning
- The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models
- The Diffusion Duality
- The Disparate Benefits of Deep Ensembles
- The Double-Ellipsoid Geometry of CLIP
- The Efficiency of Guidance in Diffusion Models for General Data Distribution
- The Elicitation Game: Evaluating Capability Elicitation Techniques
- The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
- The Empirical Mean is Minimax Optimal for Local Glivenko-Cantelli
- The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
- The Four Color Theorem for Cell Instance Segmentation
- The Generalized Skew Spectrum of Graphs
- The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
- The Global Convergence Time of Stochastic Gradient Descent in Non-Convex Landscapes: Sharp Estimates through Large Deviations
- The Harder Path: Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback
- The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Analysis of Orthogonal Safety Directions
- The Hidden Joules: Evaluating the Energy Consumption of Vision Backbones for Progress Towards More Efficient Model Inference
- The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models Via Visual Information Steering
- The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)
- The Impact of Memorization on Trustworthy Foundation Models
- The impact of on-policy parallelized data collection on network plasticity in deep reinforcement learning
- The impact of uncertainty on regularized learning in games
- The Limits of Predicting Agents from Behaviour
- The Limits of Tractable Marginalization
- The Lock-in Hypothesis: Stagnation by Algorithm
- The Logical Implication Steering Method for Conditional Interventions on Transformer Generation
- The Missing Alignment Link of In-context Learning on Sequences
- The Noisy Laplacian: a threshold phenomenon for non-linear dimension reduction
- The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes
- Theoretical Guarantees for Robust Federated Model Evaluation
- Theoretical guarantees on the best-of-n alignment policy
- Theoretical Limitations of Ensembles in the Age of Overparameterization
- Theoretically Unmasking Inference Attacks Against LDP-Protected Clients in Federated Vision Models
- Theoretical Performance Guarantees for Partial Domain Adaptation via Partial Optimal Transport
- The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated Learning
- The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret
- The Polynomial Stein Discrepancy for Assessing Moment Convergence
- The Power of Random Features and the Limits of Distribution-Free Gradient Descent
- The Price of Freedom: Exploring Tradeoffs in Equivariant Tensor Product Operations
- The Price of Linear Time: Error Analysis of Structured Kernel Interpolation
- The Relationship Between No-Regret Learning and Online Conformal Prediction
- The Ripple Effect: On Unforeseen Complications of Backdoor Attacks
- Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaos
- The Role of Randomness in Stability
- The Role of Sparsity for Length Generalization in LLMs
- The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability
- The Second Workshop on Long-Context Foundation Models
- The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
- The Sparse-Plus-Low-Rank Quasi-Newton Method for Entropic Regularized Optimal Transport
- The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training
- The Surprising Effectiveness of Test-Time Training for Few-Shot Learning
- The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-fidelity Data
- The Underlying Logic of Language Models
- The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training
- The Underlying Universal Statistical Structure of Natural Datasets
- The Value of Prediction in Identifying the Worst-Off
- Thickness-aware E(3)-Equivariant Mesh Neural Networks
- Thinking LLMs: General Instruction Following with Thought Generation
- Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
- Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making
- Three-dimensional Trajectory Prediction with 3DMoTraj Dataset
- Tight and Fast Bounds for Multi-Label Learning
- Tight and reliable conformal prediction
- Tightening Causal Bounds via Covariate-Aware Optimal Transport
- Time-Aware World Model for Adaptive Prediction and Control
- TimeBase: The Power of Minimalism in Efficient Long-term Time Series Forecasting
- TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting
- TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation
- TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting
- TimePoint: Accelerated Time Series Alignment via Self-Supervised Keypoint and Descriptor Learning
- TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-state
- Time Series Representations with Hard-Coded Invariances
- TimeStacker: A Novel Framework with Multilevel Observation for Capturing Nonstationary Patterns in Time Series Forecasting
- TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision
- Time to Spike? Understanding the Representational Power of Spiking Neural Networks in Discrete Time
- Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting
- Timing: Temporality-Aware Integrated Gradients for Time Series Explanation
- TINED: GNNs-to-MLPs by Teacher Injection and Dirichlet Energy Distillation
- TinyMIG: Transferring Generalization from Vision Foundation Models to Single-Domain Medical Imaging
- Tiny Titans: The next wave of On-Device Learning for Foundation Models (TTODLer-FM)
- TLLC: Transfer Learning-based Label Completion for Crowdsourcing
- TMetaNet: Topological Meta-Learning Framework for Dynamic Link Prediction
- To Each Metric Its Decoding: Post-Hoc Optimal Decision Rules of Probabilistic Hierarchical Classifiers
- Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
- Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning
- Token Coordinated Prompt Attention is Needed for Visual Prompting
- Tokenization Workshop (TokShop)
- Tokenized Bandit for LLM Decoding and Alignment
- Token Signature: Predicting Chain-of-Thought Gains with Token Decoding Feature in Large Language Models
- ToMA: Token Merging with Attention For Diffusion Models
- Tool Unlearning for Tool-Augmented LLMs
- TopInG: Topologically Interpretable Graph Learning via Persistent Rationale Filtration
- TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference
- Topological Signatures of Adversaries in Multimodal Alignments
- Topology-Aware Dynamic Reweighting for Node Classification under Distribution Shifts
- Topology-aware Neural Flux Prediction Guided by Physics
- TopoTune: A Framework for Generalized Combinatorial Complex Neural Networks
- To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models
- Toward a Unified Theory of Gradient Descent under Generalized Smoothness
- Toward Data-centric Directed Graph Learning: An Entropy-driven Approach
- Toward Efficient Kernel-Based Solvers for Nonlinear PDEs
- Toward Interpretable LDA Topic Models with Strong Guarantees in Logarithmic Parallel Time
- Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage
- Towards a Formal Theory of Representational Compositionality
- Towards a General Time Series Forecasting Model with Unified Representation and Adaptive Transfer
- Towards a Mechanistic Explanation of Diffusion Model Generalization
- Towards an Explainable Comparison of Feature Embeddings
- Towards Attributions of Input Variables in a Coalition
- Towards a Unified Framework of Clustering-based Anomaly Detection
- Towards Black-Box Membership Inference Attack for Diffusion Models
- Towards characterizing the value of edge embeddings in Graph Neural Networks
- Towards Cost-Effective Reward Guided Text Generation
- Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
- Towards Escaping from Class Dependency Modeling for Multi-Dimensional Classification
- Towards flexible perception with visual memory
- Towards Global-level Mechanistic Interpretability: A Perspective of Modular Circuits of Large Language Models
- Towards Large Language Models with Greater Activation Sparsity
- Towards Learning Generalities Across Graphs via Task-Trees
- Towards Learning to Complete Anything in Lidar
- Towards Lifelong Model Editing via Simulating Ideal Editor
- Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond
- Towards Memorization Estimation: Fast, Formal and Free
- Towards Practical Defect-Focused Automated Code Review
- Towards Rationale-Answer Alignment of LVLMs via Self-Rationale Calibration
- Towards Robust Influence Functions with Flat Validation Minima
- Towards Robustness and Explainability of Automatic Algorithm Selection
- Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
- Towards the Causal Complete Cause of Multi-Modal Representation Learning
- Towards the Efficient Inference by Incorporating Automated Computational Phenotypes under Covariate Shift
- Towards Theoretical Understanding of Sequential Decision Making with Preference Feedback
- Towards Trustworthy Distributed Learning with Untrusted Parties
- Towards Understanding Catastrophic Forgetting in Two-layer Convolutional Neural Networks
- Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
- Towards Understanding Parametric Generalized Category Discovery on Graphs
- Towards Universal Offline Black-Box Optimization via Learning String Embedding Space
- Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
- TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation
- TraceGrad: a Framework Learning Expressive SO(3)-equivariant Non-linear Representations for Electronic-Structure Hamiltonian Prediction
- Tracking Most Significant Shifts in Infinite-Armed Bandits
- Tracking The Best Expert Privately
- Tractable Transformers for Flexible Conditional Generation
- TraffiX-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes
- Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
- Training a Generally Curious Agent
- Training Consistency Models with Variational Noise Coupling
- Training Deep Learning Models with Norm-Constrained LMOs
- Training Diffusion-based Generative Models with Limited Data
- Training Dynamics of In-Context Learning in Linear Attention
- Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra
- Training High Performance Spiking Neural Networks by Temporal Model Calibration
- Training Neural Networks at Any Scale
- Training Software Engineering Agents and Verifiers with SWE-Gym
- Training Your Agent to Explore via In-Context Adaptation
- Trajectory Inference with Smooth Schrödinger Bridges
- Trajectory World Models for Heterogeneous Environments
- Transfer Learning for Nonparametric Contextual Dynamic Pricing
- Transfer Q-Learning with Composite MDP Structures
- Transformative or Conservative? Conservation laws for ResNets and Transformers
- Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation
- Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training
- Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries
- TransPL: Pseudo-Labeling via Code Transitions for Time Series Adaptation
- TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree
- Tree-Sliced Wasserstein Distance: A Geometric Perspective
- Tree-Sliced Wasserstein Distance with Nonlinear Projection
- Triple-Optimistic Learning for Stochastic Contextual Bandits with General Constraints
- Trusted Multi-View Classification with Expert Knowledge Constraints
- Trust-Region Twisted Policy Improvement
- TRUST-VLM: Thorough Red-Teaming for Uncovering Safety Threats in Vision-Language Models
- Trustworthy Machine Learning through Data-Specific Indistinguishability
- TruthFlow: Truthful LLM Generation via Representation Flow Correction
- TS-SNN: Temporal Shift Module for Spiking Neural Networks
- TtBA: Two-third Bridge Approach for Decision-Based Adversarial Attack
- TTFSFormer: A TTFS-based Lossless Conversion of Spiking Transformer
- Tuning LLM Judge Design Decisions for 1/1000 of the Cost
- Tuning Sequential Monte Carlo Samplers via Greedy Incremental Divergence Minimization
- Tutorial on Mechanistic Interpretability for Language Models
- Two Tickets are Better than One: Fair and Accurate Hiring Under Strategic Stochastic Manipulations
- TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories
- UDora: A Unified Red Teaming Framework Against LLM Agents by Dynamically Leveraging Their Own Reasoning
- UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
- UI-Vision: Desktop-centric GUI Benchmark for Visual Perception and Interaction
- Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion
- Ultra-Resolution Adaptation with Ease
- UltraTWD: Optimizing Ultrametric Trees for Tree-Wasserstein Distance
- Unbiased and Economic LLM Evaluation via Synthetic Feedback
- Unbiased Evaluation of Large Language Models from a Causal Perspective
- Unbiased Recommender Learning from Implicit Feedback via Progressive Proximal Transport
- UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model
- Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos
- Uncertainty Estimation for Heterophilic Graphs Through the Lens of Information Theory
- Uncertainty Quantification for LLM-Based Survey Simulations
- Unconstrained Robust Online Convex Optimization
- Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning
- Understanding and Improving Length Generalization in Recurrent Models
- Understanding and Mitigating Memorization in Diffusion Models for Tabular Data
- Understanding and Mitigating Memorization in Generative Models via Sharpness of Probability Landscapes
- Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models
- Understanding Bias Reinforcement in LLM Agents Debate
- Understanding Chain-of-Thought in LLMs through Information Theory
- Understanding Complexity in VideoQA via Visual Program Generation
- Understanding Fixed Predictions via Confined Regions
- Understanding Generalization in Quantum Machine Learning with Margins
- Understanding High-Dimensional Bayesian Optimization
- Understanding Input Selectivity in Mamba: Impact on Approximation Power, Memorization, and Associative Recall Capacity
- Understanding Mode Connectivity via Parameter Space Symmetry
- Understanding Model Ensemble in Transferable Adversarial Attack
- Understanding Model Reprogramming for CLIP via Decoupling Visual Prompts
- Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach
- Understanding Nonlinear Implicit Bias via Region Counts in Input Space
- Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
- Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More
- Understanding Synthetic Context Extension via Retrieval Heads
- Understanding the Accuracy-Communication Trade-off in Personalized Federated Learning
- Understanding the difficulties in posterior predictive estimation
- Understanding the Emergence of Multimodal Representation Alignment
- Understanding the Forgetting of (Replay-based) Continual Learning via Feature Learning: Angle Matters
- Understanding the Kronecker Matrix-Vector Complexity of Linear Algebra
- Understanding the Limits of Deep Tabular Methods with Temporal Shift
- Understanding the Limits of Lifelong Knowledge Editing in LLMs
- Understanding the Logic of Direct Preference Alignment through Logic
- Understanding the Skill Gap in Recurrent Models: The Role of the Gather-and-Aggregate Mechanism
- Understanding the Unfairness in Network Quantization
- UnHiPPO: Uncertainty-aware Initialization for State Space Models
- UniDB: A Unified Diffusion Bridge Framework via Stochastic Optimal Control
- Unifews: You Need Fewer Operations for Efficient Graph Neural Networks
- Unified Analysis of Continuous Weak Features Learning with Applications to Learning from Missing Data
- Unified Breakdown Analysis for Byzantine Robust Gossip
- Unified K-Means Clustering with Label-Guided Manifold Learning
- Unified Screening for Multiple Diseases
- Uniform Mean Estimation via Median-of-Means for Heavy-Tailed Distributions
- Unifying 2D and 3D Vision-Language Understanding
- Unifying Knowledge from Diverse Datasets to Enhance Spatial-Temporal Modeling: A Granularity-Adaptive Geographical Embedding Approach
- Unifying Specialized Visual Encoders for Video Language Models
- UniMate: A Unified Model for Mechanical Metamaterial Generation, Property Prediction, and Condition Confirmation
- UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation
- UniMoMo: Unified Generative Modeling of 3D Molecules for De Novo Binder Design
- UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules
- Unisolver: PDE-Conditional Transformers Are Universal Neural PDE Solvers
- Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems
- UnitFlow: Synthesizing Software Engineering Data in a Test-Driven Manner
- Universal Approximation of Mean-Field Models via Transformers
- Universal Approximation Theorem of Deep Q-Networks
- Universal Length Generalization with Turing Programs
- Universal Neural Optimal Transport
- Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
- Unlocking Post-hoc Dataset Inference with Synthetic Data
- Unlocking the Capabilities of Vision-Language Models for Generalizable Deepfake Detection
- Unlocking the Potential of Classic GNNs for Graph-level Tasks: Simple Architectures Meet Excellence
- Unlocking the Power of Rehearsal in Continual Learning: A Theoretical Perspective
- Unlocking the Power of SAM 2 for Few-Shot Segmentation
- Unnatural Languages Are Not Bugs but Features for LLMs
- Unpaired Point Cloud Completion via Unbalanced Optimal Transport
- Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments
- Unsupervised Blind Speech Separation with a Diffusion Prior
- Unsupervised Learning for Class Distribution Mismatch
- Unveiling AI's Blind Spots: An Oracle for In-Domain, Out-of-Domain, and Adversarial Errors
- Unveiling Markov heads in Pretrained Language Models for Offline Reinforcement Learning
- Upcycling Text-to-Image Diffusion Models for Multi-Task Capabilities
- Update Your Transformer to the Latest Release: Re-Basin of Task Vectors
- UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent
- Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting
- Validating Mechanistic Interpretations: An Axiomatic Approach
- Value-Based Deep RL Scales Predictably
- Variance as a Catalyst: Efficient and Transferable Semantic Erasure Adversarial Attack for Customized Diffusion Models
- Variance-Reduced Forward-Reflected-Backward Splitting Methods for Nonmonotone Generalized Equations
- Variational Control for Guidance in Diffusion Models
- Variational Counterfactual Intervention Planning to Achieve Target Outcomes
- Variational Learning of Fractional Posteriors
- Variational Phylogenetic Inference with Products over Bipartitions
- Variational Rectified Flow Matching
- Vector Grimoire: Codebook-based Shape Generation under Raster Image Supervision
- VerbalTS: Generating Time Series from Texts
- Verification Learning: Make Unsupervised Neuro-Symbolic System Feasible
- VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data
- Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach
- VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
- Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
- VideoRoPE: What Makes for Good Video Rotary Position Embedding?
- video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
- VinePPO: Refining Credit Assignment in RL Training of LLMs
- Vintix: Action Model via In-Context Reinforcement Learning
- VIP: Vision Instructed Pre-training for Robotic Manipulation
- Vision Graph Prompting via Semantic Low-Rank Decomposition
- Vision-Language Models Create Cross-Modal Task Representations
- VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters
- VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
- Visual Abstraction: A Plug-and-Play Approach for Text-Visual Retrieval
- Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning
- Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models
- Visual Autoregressive Modeling for Image Super-Resolution
- Visual Generation Without Guidance
- Visual Graph Arena: Evaluating AI's Visual Conceptualization
- ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy
- Volume-Aware Distance for Robust Similarity Learning
- Volume Optimality in Conformal Prediction with Structured Prediction Sets
- Voronoi-grid-based Pareto Front Learning and Its Application to Collaborative Federated Learning
- VTGaussian-SLAM: RGBD SLAM for Large Scale Scenes with Splatting View-Tied 3D Gaussians
- Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning
- Wait-Less Offline Tuning and Re-solving for Online Decision Making
- Wasserstein Flow Matching: Generative Modeling Over Families of Distributions
- Wasserstein Policy Optimization
- Watch Out Your Album! On Unintentional Privacy Memorization in Multi-Modal Large Language Models
- WATCH: Weighted Adaptive Testing for Changepoint Hypotheses via Weighted Conformal Martingales
- WAVE: Weighted Autoregressive Varing Gate for Time Series Forecasting
- Weakly Supervised Anomaly Detection via Dual-Tailed Kernel
- Weakly-Supervised Contrastive Learning for Imprecise Class Labels
- Weak-to-Strong Generalization Even in Random Feature Networks, Provably
- WebOrganizer: Constructing Domains Enhances Pre-Training Data Curation
- Weight-Aware Fine-Tuning for Multi-faceted Efficiency across Parameters, Representations, Compute and Memory
- Weight matrices compression based on PDB model in deep neural networks
- Weisfeiler and Leman Go Gambling: Why Expressive Lottery Tickets Win
- WGFormer: An SE(3)-Transformer Driven by Wasserstein Gradient Flows for Molecular Ground-State Conformation Prediction
- What can large language models do for sustainable food?
- What If We Recaption Billions of Web Images with LLaMA-3?
- What Limits Bidirectional Model's Generative Capabilities? A Uni-Bi-Directional Mixture-of-Expert Method For Bidirectional Fine-tuning
- What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark of Essential Virtual Agent Capabilities
- What makes a good feedforward computational graph?
- What Makes an Ensemble Hard to Interpret? A Theoretical Approach
- What Makes In-context Learning Effective for Mathematical Reasoning: A Theoretical Analysis
- When and How Does CLIP Enable Domain and Compositional Generalization?
- When Bad Data Leads to Good Models
- When Can Proxies Improve the Sample Complexity of Preference Learning?
- When Data-Free Knowledge Distillation Meets Non-Transferable Teacher: Escaping Out-of-Distribution Trap is All You Need
- When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets
- When do neural networks learn world models?
- When Dynamic Data Selection Meets Data Augmentation: Achieving Enhanced Training Acceleration
- When Entropy Misleads Policy Optimization
- When Every Millisecond Counts: Real-Time Anomaly Detection via the Multimodal Asynchronous Hybrid Network
- When Model Knowledge meets Diffusion Model: Diffusion-assisted Data-free Image Synthesis with Alignment of Domain and Class
- When to Forget? Complexity Trade-offs in Machine Unlearning
- When to retrain a machine learning model
- When, Where and Why to Average Weights?
- When Will It Fail?: Anomaly to Prompt for Forecasting Future Anomalies in Time Series
- Where is the Truth? The Risk of Getting Confounded in a Continual World
- Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
- Which Attention Heads Matter for In-Context Learning?
- Whitened CLIP as a Likelihood Surrogate of Images and Captions
- Whoever Started the interference Should End It: Guiding Data-Free Model Merging via Task Vectors
- "Who experiences large model decay and why?" A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift
- Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
- Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
- "Why There Is a Tumor?": Safeguard Tumor Segmentation and Detection with Trustworthy Rationales
- Widening the Network Mitigates the Impact of Data Heterogeneity on FedAvg
- WildChat-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training
- WILTing Trees: Interpreting the Distance Between MPNN Embeddings
- Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
- Winner-takes-all for Multivariate Probabilistic Time Series Forecasting
- WMAdapter: Adding WaterMark Control to Latent Diffusion Models
- WMarkGPT: Watermarked Image Understanding via Multimodal Large Language Models
- Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning
- WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving
- Workshop on Computer Use Agents
- Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences
- Workshop on Technical AI Governance
- World Model Implanting for Test-time Adaptation of Embodied Agents
- World models are necessary for zero-shot generalization in agents
- WorldSimBench: Towards Video Generation Models as World Simulators
- Wrapped Gaussian on the manifold of Symmetric Positive Definite Matrices
- WyckoffDiff -- A Generative Diffusion Model for Crystal Symmetry
- Wyckoff Transformer: Generation of Symmetric Crystals
- XAttention: Unlocking the Power of Block Sparse Attention with Antidiagonal Scoring
- XAttnMark: Learning Robust Audio Watermarking with Cross-Attention
- X-Hacking: The Threat of Misguided AutoML
- xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
- X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
- You Always Recognize Me (YARM): Robust Texture Synthesis Against Multi-View Corruption
- You Get What You Give: Reciprocally Fair Federated Learning
- Zebra: In-Context Generative Pretraining for Solving Parametric PDEs
- ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
- ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think
- Zero-Inflated Bandits
- Zero-Shot Adaptation of Parameter-Efficient Fine-Tuning in Diffusion Models
- Zero-Shot Cyclic Peptide Design via Composable Geometric Constraints
- Zero-Shot Generalization of GNNs over Distinct Attribute Domains
- Zero Shot Generalization of Vision-Based RL Without Data Augmentation
- Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer
- Zero-Shot Offline Imitation Learning via Optimal Transport
- ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality
- µnit Scaling: Simple and Scalable FP8 LLM Training