Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Cultural Commonsense Reasoning on CultureAtlas High Resource
Loading...
95.9
Precision
GPT-4
54.508
65.254
76
86.746
Jan 7, 2026
Precision
Recall
F1-score
Updated 4d ago
Evaluation Results
Method
Method
Links
Precision
Recall
F1-score
GPT-4
2026.01
95.9
91.4
93.6
CALM
2026.01
95
90.9
92.4
LLaMA-2
Parameters=7B
2026.01
86.8
45.6
59.8
Vicuna
Parameters=7B
2026.01
77.3
47.2
58.6
Vicuna
Parameters=13B
2026.01
68.9
81
74.5
LLaMA-2
Parameters=13B
2026.01
56.1
80.9
66.3
Feedback
Search any
task
Search any
task