Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Cultural Reasoning on CulturalBench-Hard (CB-H) (test)
Loading...
46.98
Accuracy
C-Mining
25.8576
31.3413
36.825
42.3087
Apr 17, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
C-Mining
Backbone=Qwen3-32B, In...
2026.04
46.98
Qwen3-32B
Backbone=Qwen3-32B, In...
2026.04
44.62
C-Mining
Backbone=Qwen2.5-7B, I...
2026.04
40.78
CultureBank
Backbone=Qwen2.5-7B, I...
2026.04
40.13
CultureLLM
Backbone=Qwen2.5-7B, I...
2026.04
38.99
Llama3.1-8B
Backbone=Llama3.1-8B,...
2026.04
37.44
Qwen2.5-7B
Backbone=Qwen2.5-7B, I...
2026.04
34.75
GLM4-9B
Backbone=GLM4-9B, Inst...
2026.04
34.1
CultureSPA
Backbone=Qwen2.5-7B, I...
2026.04
30.18
Ministral3-8B
Backbone=Ministral3-8B...
2026.04
26.67
Feedback
Search any
task
Search any
task