Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Expert-level Science Reasoning on GPQA
Loading...
18.81
LLMcritic Calls
VecCISC + HAC
10.23
12.4575
14.685
16.9125
May 8, 2026
LLMcritic Calls
Reduction (%)
Updated 23d ago
Evaluation Results
Method
Method
Links
LLMcritic Calls
Reduction (%)
VecCISC + HAC
Budget=20, Model=Mistr...
2026.05
18.81
-5.94
VecCISC + HAC
Budget=20, Model=Llama...
2026.05
17.95
-10.24
VecCISC + HAC
Budget=20, Model=Qwen2...
2026.05
16.41
-17.93
VecCISC + HAC
Budget=20, Model=Llama...
2026.05
16
-20.02
VecCISC + HAC
Budget=20, Model=GPT-4...
2026.05
10.56
-47.19
Feedback
Search any
task
Search any
task