When LLMs Encounter Open-world Graph Learning: A Fresh View on Unlabeled Data Uncertainty
Abstract
Recently, large language models (LLMs) have driven a systematic shift in the graph ML com- munity through the adoption of text-attributed graphs (TAGs). Although a variety of frameworks have been developed, most fail to properly ad- dress the challenge of data uncertainty in open- world environments. A representative source of such uncertainty is the limited availability of la- bels in large-scale datasets due to high annotation costs, where unlabeled nodes may belong to either known classes or novel, unknown classes. While node-level out-of-distribution detection and con- ventional open-world graph learning attempt to tackle this problem, two core limitations remain: ① Insufficient methods — existing approaches typically optimize semantics or topology in isola- tion for unknown-class rejection, failing to effec- tively integrate textual and structural information in TAGs; ② Incomplete pipelines — most stud- ies conduct only idealized analyses, such as as- suming a predefined number of unknown classes, which restricts practical utility for model updates and long-term deployment. To overcome these issues, we introduce the Open-world Graph Assis- tant (OGA), an LLM-based framework. OGA first performs unknown-class rejection via adaptive la- bel traceability (ALT), harmoniously combining semantic and topological cues, and then applies the graph label annotator (GLA) for unknown- class annotation, allowing unlabeled nodes to con- tribute to model training. In essence, OGA offers a new pipeline that fully automates the handling of unlabeled nodes in open-world environments, and we establish a systematic benchmark cover- ing four key aspects to validate its effectiveness and practicality through extensive experiments.