Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Compositional Reasoning on Harder-set

51.3Strict Accuracy

Qwen2.5-7B

-2.05211.79925.6539.501May 26, 2026
Updated 7d ago

Evaluation Results

MethodLinks
2026.05
51.347.650.450.4
2026.05
47.910047.947.9
2026.05
46.771.446.740.2
2026.05
3.414.32.83.4
2026.05
0--0
2026.05
076760