CLAM-Bench: Benchmarking LLM Agents for Library-Scale Cross-Architecture Migration
Abstract
Cross-architecture migration of high-performance libraries dictates ecosystem readiness on emerging hardware. The challenge is twofold: disentangling library-scale dependencies and performance-critical kernels with ISA-specific SIMD intrinsics, often trading migration speed for peak performance. While LLM-based agents offer a promising approach, are confined to function-level tasks or scalar code, failing to assess agents’ capabilities and limitations in realistic, library-scale migration. We present CLAM-Bench (Cross-architecture Library-scale Agent Migration benchmark), featuring 85 critical kernels from widely used libraries, including OpenCV, libjpeg, and NCNN. It supports comprehensive evaluations of compilability, correctness, and performance across major transitions: ARM→RISC-V, x86→ARM, and ARM→LoongArch. Evaluation of 12 SOTA agent-LLM combinations on CLAM-Bench reveals that, due to the lack of library-level navigation and hardware-aware optimization, agents regress to superficial pattern matching, yielding only 20.88% correctness and 0.83x speedup for libjpeg. Motivated by these findings, we further propose FSCM, a multi-agent framework incorporating hardware-aware global reconfiguration and performance optimization. FSCM improves OpenCV correctness to 71%. The benchmark and code are available at https://anonymous.4open.science/r/clam_bench-D8EB/.