Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Human Consistency Evaluation on Q-Reasoning (test)

51.4ROUGE-1 Score

Proposed Human-Like Reasoning Framework (detailed)

26.9633.30539.6545.995Dec 18, 2025
Updated 3mo ago

Evaluation Results

MethodLinks
51.4
2025.12
51.2
2025.12
48.7
2025.12
44.3
2025.12
31.8
2025.12
27.9