Hypergraph as Language

About

Large language models (LLMs) have recently shown strong potential in modeling relational structures. However, existing approaches remain fundamentally graph-centric: they focus on processing pairwise graph structures into tokens that LLMs can understand. In contrast, many real-world relational patterns do not naturally conform to the pairwise-edge assumption, and are better modeled as high-order associations in hypergraphs. For hypergraph structures, existing methods often fail to preserve the native semantics that multiple objects are jointly connected by the same high-order relation, limiting their ability to exploit complex structures. To address this limitation, we put forth the "Hypergraph as Language" perspective and propose Hyper-Align, a hypergraph-native alignment framework for large language models. Hyper-Align compiles the query-object-centered hypergraph context into hypergraph tokens directly consumable by a base LLM. Specifically, we introduce Hypergraph Incidence Detail Template with Overview (HIDT-O), which serializes high-order association structures into a fixed-shape hybrid template combining local incidence details and overview-level summaries. We then design a Hypergraph Incidence Projector (HIP), which maps native high-order incidence structures into the LLM token space through explicit semantic-structural decoupling and bidirectional message passing between vertices and hyperedges. We further define a concrete Hypergraph-as-Language input protocol, which jointly feeds hypergraph tokens and textual prompts into a frozen base LLM, supporting both vertex-level and hyperedge-level tasks under a unified question-answering paradigm. To systematically evaluate different methods in hypergraph structural modeling, we introduce HyperAlign-Bench. Extensive experiments show that Hyper-Align significantly outperforms existing methods across in-domain and zero-shot evaluations.

Mengqi Lei, Guohuan Xie, Shihui Ying, Shaoyi Du, Jun-Hai Yong, Siqi Li, Yue Gao• 2026

Related benchmarks

Task	Dataset	Result
Node Classification	IMDB	--	211
Graph Classification	obgn-arXiv (test)	Accuracy78.2	28
Vertex Classification	Arxiv-HG In-domain (test)	Accuracy76.9	18
Hyperedge Classification	Cora-CC	Accuracy75.7	9
Hyperedge Classification	Pubmed	Accuracy77.6	9
Hyperedge Classification	DBLP	Accuracy64.6	9
Hyperedge Classification	IMDB	Accuracy44.9	9
Vertex Classification	Cora-CC	Accuracy74.8	9
Vertex Classification	Pubmed	Accuracy77.5	9
Vertex Classification	DBLP	Accuracy67.2	9

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord