Language Models as Nodes: Constructing a High-Level Neural Network
Abstract
The structural organization of language models plays a crucial role in the inference process of large language models (LLMs), occurring both iteratively within a single model for test-time scaling and interactively across multiple models for collaborative intelligence. While current systems primarily facilitate such interaction through natural language, this paper proposes constructing a high-level neural network, termed LMNet, by treating pre-trained LLMs as optimizable nodes connected via continuous dense vectors. Our approach eliminates the unnecessary embedding and de-embedding steps when one LLM connects to another, enabling more efficient information transfer, a fully differentiable optimization path, and exploration of capabilities beyond human heuristics. We place stripped LLMs as vertexes and optimizable seq2seq modules as edges to construct LMNet, a directed graph with a similar structure to MLPs, and perform end-to-end gradient-descent for efficient optimization. As two exemplar applications, we show the proposed architecture can effectively improve LLM’s general intelligence, and customize LLM with limited data. We also provide detailed discussion and analysis about the emergent behavior of this high-level network.