Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Commonsense Reasoning on CommonsenseQA (LLMcritic Metrics)
Loading...
15.54
LLMcritic Calls
VecCISC + HAC
11.64
12.6525
13.665
14.6775
May 8, 2026
LLMcritic Calls
Reduction (%)
Updated 22d ago
Evaluation Results
Method
Method
Links
LLMcritic Calls
Reduction (%)
VecCISC + HAC
Budget=20, Model=Mistr...
2026.05
15.54
-22.31
VecCISC + KMeans
Budget=20, Backbone=Mi...
2026.05
15.54
-22.31
VecCISC + HAC
Budget=20, Model=Qwen2...
2026.05
15.23
-23.83
VecCISC + KMeans
Budget=20, Backbone=GP...
2026.05
13.81
-30.95
VecCISC + HAC
Budget=20, Model=Llama...
2026.05
12.7
-36.5
VecCISC + KMeans
Budget=20, Backbone=Ll...
2026.05
12.7
-36.5
VecCISC + KMeans
Budget=20, Backbone=Qw...
2026.05
12.45
-37.76
VecCISC + HAC
Budget=20, Model=GPT-4...
2026.05
11.89
-40.56
VecCISC + HAC
Budget=20, Model=Llama...
2026.05
11.79
-41.02
VecCISC + KMeans
Budget=20, Backbone=Ll...
2026.05
11.79
-41.02
Feedback
Search any
task
Search any
task