Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on Cert-QA
Loading...
4.49
GenericJudge Score
GPT-5
4.3548
4.3899
4.425
4.4601
Apr 7, 2026
GenericJudge Score
Token Usage (M)
Latency (s)
Updated 11d ago
Evaluation Results
Method
Method
Links
GenericJudge Score
Token Usage (M)
Latency (s)
GPT-5
Optimization=Standard...
2026.04
4.49
3.1
8.1
GPT-5 + HYVE
Optimization=HYVE pipe...
2026.04
4.49
3.1
7.34
GPT-4.1 + HYVE
Optimization=HYVE pipe...
2026.04
4.38
1.1
2.91
GPT-4.1
Optimization=Standard...
2026.04
4.36
1.1
3.21
Feedback
Search any
task
Search any
task