Poster Tue, Jul 7, 2026 • 10:30 PM – 12:15 AM PDT HALL A #4626

SLASH the Sink: Sharpening Structural Attention Inside LLMs

Yiming Liu ⋅ Bin Lu ⋅ Xinbing Wang ⋅ Chenghu Zhou ⋅ Meng Jin

Project Page

Abstract

Large Language Models (LLMs) show remarkable semantic understanding but often struggle with structural understanding when processing graph topologies in a serialized format. Existing solutions rely on training external graph-based adapters or fine-tuning, which incur high costs and lost generalizability. In this work, we investigate the internal mechanisms of LLMs and present a critical finding: LLMs spontaneously reconstruct the graph's topology internally, evidenced by a distinct "sawtooth" pattern in their attention maps that structurally aligns with the "token-level adjacency matrix". However, this intrinsic structural understanding is diluted by the attention sink. We theoretically formalize this dilution as a representation bottleneck, stemming from a fundamental conflict: the model's anisotropic bias, essential for language tasks, suppresses the topology-aware local aggregation required for graph reasoning. To address this, we propose a training-free solution, named StructuraL Attention SHarpening (SLASH), which amplifies this internal structural understanding via a plug-and-play attention redistribution. Experiments on pure graph tasks and molecular prediction validate that SLASH delivers significant and consistent performance gains across diverse LLMs.

Lay Summary

Large language models (LLMs) are excellent at understanding text, but they struggle when asked to reason about structured relationships — for example, finding the shortest path in a network, or predicting the properties of a molecule based on how its atoms are connected. Existing fixes typically require expensive retraining or add-on components that reduce the model's flexibility. We discovered that LLMs actually do form an internal "map" of these structures on their own — a hidden signal buried inside the model's attention mechanism. However, a well-known phenomenon called the attention sink, where the model compulsively focuses on a few anchor tokens, drowns out this signal before it can be used. We propose SLASH, a lightweight, training-free method that amplifies this hidden structural signal at inference time, with no modifications to the model's weights. Across a wide range of LLMs and tasks — from graph problems to molecular property prediction — SLASH consistently improves performance, unlocking structural reasoning capabilities that were already there, just suppressed.