Toggle Poster Visibility
Oral
Tue Jul 07 10:00 AM -- 10:15 AM (KST) None
Benchmarking at the Edge of Comprehension
In
Oral 1B
[ OpenReview]
Oral
Tue Jul 07 10:15 AM -- 10:30 AM (KST) None
daVinci-Dev: Agent-native Mid-training for Software Engineering
In
Oral 1B
[ OpenReview]
Oral
Tue Jul 07 10:30 AM -- 10:45 AM (KST) None
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
In
Oral 1B
[ OpenReview]
Oral
Tue Jul 07 10:45 AM -- 11:00 AM (KST) None
VenusBench-Mobile: A Challenging and User-Centric Benchmark for Mobile GUI Agents with Capability Diagnostics
In
Oral 1B
[ OpenReview]
Successful Page Load