firstbacksecondback
95 Results
Poster
|
Wed 2:30 |
A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models Jiayi Wang · Zhengling Qi · Raymond K. W. Wong |
|
Oral
|
Wed 7:30 |
Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation Gauthier Guinet · Behrooz Tehrani · Anoop Deoras · Laurent Callot |
|
Poster
|
Thu 4:30 |
Stability Evaluation through Distributional Perturbation Analysis Jose Blanchet · Peng Cui · Jiajin Li · Jiashuo Liu |
|
Expo Talk Panel
|
Automated Evaluation of LLM responses P Aditya Sreekar · Sahil Verma · Surnash Chopra |
||
Expo Talk Panel
|
Sun 5:30 |
Automated Evaluation of LLM responses Abhishek Persad · Akash Gupta |
|
Oral
|
Wed 2:15 |
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks Linyuan Gong · Sida Wang · Mostafa Elhoushi · Alvin Cheung |
|
Poster
|
Thu 4:30 |
Evaluating Instrument Validity using the Principle of Independent Mechanisms Patrick F. Burauel |
|
Poster
|
Wed 2:30 |
Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling Denis Blessing · Xiaogang Jia · Johannes Esslinger · Francisco Vargas · Gerhard Neumann |
|
Poster
|
Wed 2:30 |
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks Linyuan Gong · Sida Wang · Mostafa Elhoushi · Alvin Cheung |
|
Poster
|
Wed 4:30 |
Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation Gauthier Guinet · Behrooz Tehrani · Anoop Deoras · Laurent Callot |
|
Poster
|
Thu 2:30 |
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks Xueyu Hu · Ziyu Zhao · Shuang Wei · Ziwei Chai · Qianli Ma · Guoyin Wang · Xuwu Wang · Jing Su · Jingjing Xu · Ming Zhu · Yao Cheng · Jianbo Yuan · Jiwei Li · Kun Kuang · Yang Yang · Hongxia Yang · Fei Wu |
|
Poster
|
Thu 2:30 |
Kernel-Based Evaluation of Conditional Biological Sequence Models Pierre Glaser · Steffan Paul · Alissa M. Hummer · Charlotte Deane · Debora Marks · Alan Amin |