AOEB: Benchmarking Agent-Oriented Multimodal Embeddings
Abstract
LLM agents powered by retrieval and RAG are increasingly prevalent across research and applications. Embedding models play a critical role in these systems, particularly in embedding-based retrieval. However, current benchmarks for embeddings, such as MTEB, remain focused on general-purpose scenarios, which fail to align well with the diverse and evolving needs of agentic applications. To close this gap, we introduce Agent-Oriented Embedding Benchmark (AOEB), a comprehensive evaluation suite dedicated to agent-centric retrieval for embedding models. AOEB is characterized by two key features: (1) Multi-Task, covering five essential capabilities for retrieval in LLM agents, including code, tool, reasoning, and memory retrieval; and (2) Multi-Modal, providing evaluation with both textual and visual data for each task category. We evaluate representative embedding models on AOEB and observe that they exhibit distinct strengths across different agent-oriented retrieval tasks. By curating AOEB, we aim to promote a move toward more practically oriented directions within the embedding community and foster further progress.