Skip to yearly menu bar Skip to main content


(4 events)   Timezone:  
Show all
Toggle Poster Visibility
Oral
Thu Jul 17 03:30 PM -- 03:45 PM (PDT) @ West Exhibition Hall C None
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Rui Yang · Hanyang(Jeremy) Chen · Junyu Zhang · Mark Zhao · Cheng Qian · Kangrui Wang · Qineng Wang · Teja Koripella · Marziyeh Movahedi · Manling Li · Heng Ji · Huan Zhang · Tong Zhang
[ OpenReview
Oral
Thu Jul 17 03:45 PM -- 04:00 PM (PDT) @ West Exhibition Hall C None
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
Samuel Miserendino · Michele Wang · Tejal Patwardhan · Johannes Heidecke
[ OpenReview
Oral
Thu Jul 17 04:00 PM -- 04:15 PM (PDT) @ West Exhibition Hall C None
CodeIO: Condensing Reasoning Patterns via Code Input-Output Prediction
Junlong Li · Daya Guo · Dejian Yang · Runxin Xu · Yu Wu · Junxian He
[ OpenReview
Oral
Thu Jul 17 04:15 PM -- 04:30 PM (PDT) @ West Exhibition Hall C None
ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
Saurabh Jha · Rohan Arora · Yuji Watanabe · Takumi Yanagawa · Yinfang Chen · Jackson Clark · Bhavya Bhavya · Mudit Verma · Harshit Kumar · Hirokuni Kitahara · Noah Zheutlin · Saki Takano · Divya Pathak · Felix George · Xinbo Wu · Bekir Turkkan · Gerard Vanloo · Michael Nidd · Ting Dai · Oishik Chatterjee · Pranjal Gupta · Suranjana Samanta · Pooja Aggarwal · Rong Lee · Jae-wook Ahn · Debanjana Kar · Amit Paradkar · Yu Deng · Pratibha Moogi · Prateeti Mohapatra · Naoki Abe · Chandrasekhar Narayanaswami · Tianyin Xu · Lav Varshney · Ruchi Mahindru · Anca Sailer · Laura Shwartz · Daby Sow · Nicholas Fuller · Ruchir Puri
[ OpenReview