Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Cultural Reasoning on CulturalBench-Hard (CB-H) (test)

46.98Accuracy

C-Mining

25.857631.341336.82542.3087Apr 17, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
46.98
2026.04
44.62
2026.04
40.78
2026.04
40.13
2026.04
38.99
2026.04
37.44
2026.04
34.75
2026.04
34.1
2026.04
30.18
2026.04
26.67