TabReX : Tabular Referenceless eXplainable Evaluation

About

Evaluating the quality of tables generated by large language models (LLMs) remains an open challenge: existing metrics either flatten tables into text, ignoring structure, or rely on fixed references that limit generalization. We present TabReX, a reference-less, property-driven framework for evaluating tabular generation via graph-based reasoning. TabReX converts both source text and generated tables into canonical knowledge graphs, aligns them through an LLM-guided matching process, and computes interpretable, rubric-aware scores that quantify structural and factual fidelity. The resulting metric provides controllable trade-offs between sensitivity and specificity, yielding human-aligned judgments and cell-level error traces. To systematically asses metric robustness, we introduce TabReX-Bench, a large-scale benchmark spanning six domains and twelve planner-driven perturbation types across three difficulty tiers. Empirical results show that TabReX achieves the highest correlation with expert rankings, remains stable under harder perturbations, and enables fine-grained model-vs-prompt analysis establishing a new paradigm for trustworthy, explainable evaluation of structured generation systems.

Tejas Anvekar, Junha Park, Aparna Garimella, Vivek Gupta• 2025

Related benchmarks

Task	Dataset	Result	Rank
Metric Correlation Analysis	Synthetic perturbation sets (test)	Spearman's rho (S)74.51		17
Evaluation Metric Correlation Analysis	Real-world text-to-table generation	Spearman's Rho0.39		9

Showing 2 of 2 rows

Other info

GitHub

Follow for update

@wizwand_team Discord