Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TabReX : Tabular Referenceless eXplainable Evaluation

About

Evaluating the quality of tables generated by large language models (LLMs) remains an open challenge: existing metrics either flatten tables into text, ignoring structure, or rely on fixed references that limit generalization. We present TabReX, a reference-less, property-driven framework for evaluating tabular generation via graph-based reasoning. TabReX converts both source text and generated tables into canonical knowledge graphs, aligns them through an LLM-guided matching process, and computes interpretable, rubric-aware scores that quantify structural and factual fidelity. The resulting metric provides controllable trade-offs between sensitivity and specificity, yielding human-aligned judgments and cell-level error traces. To systematically asses metric robustness, we introduce TabReX-Bench, a large-scale benchmark spanning six domains and twelve planner-driven perturbation types across three difficulty tiers. Empirical results show that TabReX achieves the highest correlation with expert rankings, remains stable under harder perturbations, and enables fine-grained model-vs-prompt analysis establishing a new paradigm for trustworthy, explainable evaluation of structured generation systems.

Tejas Anvekar, Juhna Park, Aparna Garimella, Vivek Gupta• 2025

Related benchmarks

TaskDatasetResultRank
Metric Correlation AnalysisSynthetic perturbation sets (test)
Spearman's rho (S)74.51
17
Evaluation Metric Correlation AnalysisReal-world text-to-table generation
Spearman's Rho0.39
9
Showing 2 of 2 rows

Other info

GitHub

Follow for update