Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CulturalBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Cultural AlignmentCulturalBench Hard
Accuracy63.54
24
Cultural AlignmentCulturalBench Easy
Accuracy90.8
24
Cultural ReasoningCulturalBench
Mean@389.16
23
Cultural ReasoningCulturalBench-Hard (CB-H) (test)
Accuracy46.98
10
Context UnderstandingCulturalBench
Easy Accuracy0.904
10
Showing 5 of 5 rows