Toggle Poster Visibility
Oral
Thu Jul 17 03:30 PM -- 03:45 PM (PDT) @ West Exhibition Hall C None
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
[
OpenReview]
Oral
Thu Jul 17 03:45 PM -- 04:00 PM (PDT) @ West Exhibition Hall C None
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
[
OpenReview]
Oral
Thu Jul 17 04:00 PM -- 04:15 PM (PDT) @ West Exhibition Hall C None
CodeIO: Condensing Reasoning Patterns via Code Input-Output Prediction
[
OpenReview]
Oral
Thu Jul 17 04:15 PM -- 04:30 PM (PDT) @ West Exhibition Hall C None
ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
[
OpenReview]