SCOPE and SCION: Benchmark and Method for Ontology Induction and Fusion from Text
Abstract
Ontologies (schemas) are a key bottleneck for schema-grounded information extraction and knowledge graph construction, yet manual ontology engineering is expensive and schemas quickly fragment or drift across domains. We introduce SCOPE (Schema Construction and Ontology Induction Pipeline Evaluation), a benchmark for train-only ontology/schema induction and optional ontology fusion directly from raw corpora. SCOPE normalizes 24 public IE sources (15 RE + 9 EE; zh/en) into machine-readable gold schema graphs and provides train-only induction corpora through a standardized text corpus release. We propose SCION (Structural mining and Contracted semantic Induction for Ontology constructiON and fusion), a controllable pipeline that mines a candidate space of concepts/relations/events from text, performs LLM-assisted naming/merging/filtering under a strict JSON contract with evidence pointers, and can fuse the result with a fixed base ontology package using conservative alignment with provenance tracking. On the SCOPE core suite, SCION improves ontology-level similarity over official/manual schemas, a Text2Onto-style baseline, and LLM-only induction baselines under Literal, Fuzzy, Continuous, and Graph F1. SCOPE and SCION together enable reproducible and auditable evaluation of end-to-end ontology induction and fusion.