Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Cultural commonsense reasoning on CultureAtlas (All Culture)
Loading...
95.8
Precision
GPT-4
62.312
71.006
79.7
88.394
Jan 7, 2026
Precision
Recall
F1-score
Updated 3d ago
Evaluation Results
Method
Method
Links
Precision
Recall
F1-score
GPT-4
2026.01
95.8
90.6
93.1
CALM
2026.01
93.6
87.7
89.1
LLaMA-2
Parameters=7B
2026.01
84.2
42.1
56.1
Vicuna
Parameters=7B
2026.01
79.6
56.8
66.3
Vicuna
Parameters=13B
2026.01
67.4
81.2
73.7
LLaMA-2
Parameters=13B
2026.01
63.6
77.1
69.7
Feedback
Search any
task
Search any
task