Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Compositional Generalization on Evaluation Dataset (Unseen Average)

42.86Score

Mistral-7B

19.782425.773731.76537.7563Jan 29, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
42.86-0.1948
2026.01
42.46-0.1578
2026.01
42.03-0.0092
2026.01
41.76-0.1573
2026.01
41.13-0.2118
2026.01
41.08-0.0088
2026.01
40.410.0099
2026.01
40.25-0.1817
2026.01
40.04-0.2203
2026.01
39.57-0.2096
2026.01
39.1-0.2031
2026.01
38.18-0.0162
2026.01
36.37-0.0685
2026.01
35.950.0022
2026.01
32.92-0.1433
2026.01
32.690.0213
2026.01
31.88-0.085
2026.01
20.67-0.0064