Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

54 Results

<<   <   Page 1 of 5   >   >>
Poster
Thu 2:30 Benchmarking Deletion Metrics with the Principled Explanations
Yipei Wang · Xiaoqian Wang
Poster
Thu 4:30 CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Alex Gu · Baptiste Roziere · Hugh Leather · Armando Solar-Lezama · Gabriel Synnaeve · Sida Wang
Poster
Wed 2:30 Position: Benchmarking is Limited in Reinforcement Learning Research
Scott Jordan · Adam White · Bruno da Silva · Martha White · Philip Thomas
Poster
Thu 4:30 FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning
Wenzhe Li · Zihan Ding · Seth Karten · Chi Jin
Poster
Tue 2:30 OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift
Lin Li · Yifei Wang · Chawin Sitawarin · Michael Spratling
Poster
Thu 2:30 LCA-on-the-Line: Benchmarking Out of Distribution Generalization with Class Taxonomies
Jia Shi · Gautam Rajendrakumar Gare · Jinjin Tian · Siqi Chai · Zhiqiu Lin · Arun Balajee Vasudevan · Di Feng · Francesco Ferroni · Shu Kong
Poster
Wed 2:30 Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark
Yihua Zhang · Pingzhi Li · Junyuan Hong · Jiaxiang Li · Yimeng Zhang · Wenqing Zheng · Pin-Yu Chen · Jason Lee · Wotao Yin · Mingyi Hong · Zhangyang “Atlas” Wang · Sijia Liu · Tianlong Chen
Poster
Wed 4:30 CurBench: Curriculum Learning Benchmark
Yuwei Zhou · Zirui Pan · Xin Wang · Hong Chen · Haoyang Li · Yanwen Huang · Zhixiao Xiong · Fangzhou Xiong · Peiyang Xu · Shengnan liu · Wenwu Zhu
Poster
Wed 2:30 MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying · Fanqing Meng · Jin Wang · Zhiqian Li · Han Lin · Yue Yang · Hao Zhang · Wenbo Zhang · Yuqi Lin · Shuo Liu · jiayi lei · Quanfeng Lu · Runjian Chen · Peng Xu · Renrui Zhang · Haozhe Zhang · Peng Gao · Yali Wang · Yu Qiao · Ping Luo · Kaipeng Zhang · WENQI SHAO
Poster
Thu 4:30 Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks
Guanhua Zhang · Moritz Hardt
Poster
Wed 4:30 TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Jian Xie · Kai Zhang · Jiangjie Chen · Tinghui Zhu · Renze Lou · Yuandong Tian · Yanghua Xiao · Yu Su
Oral
Thu 7:45 MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
Dongping Chen · Ruoxi Chen · Shilin Zhang · Yaochen Wang · Yinuo Liu · Huichi Zhou · Qihui Zhang · Yao Wan · Pan Zhou · Lichao Sun