Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Oral Argument Simulation on Oral Argument Evaluation Set 1.0 (test)
Loading...
8
Overall Score
gpt4o
0.72
2.61
4.5
6.39
Mar 5, 2026
Overall Score
Adversarial Realism Score
Human Evaluation Score
Issue Coverage Score
Question Type Diversity Score
Fallacy Detection Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Overall Score
Adversarial Realism Score
Human Evaluation Score
Issue Coverage Score
Question Type Diversity Score
Fallacy Detection Score
gpt4o
Implementation Strateg...
2026.03
8
7
5
7
7
8
gpt-oss-120b
Implementation Strateg...
2026.03
7
7
8
6
5
7
gpt-oss-120b
Implementation Strateg...
2026.03
5
6
7
1
6
6
Qwen3-32B
Implementation Strateg...
2026.03
5
5
6
3
8
4
gpt4o
Implementation Strateg...
2026.03
4
4
3
8
3
5
Llama-3.3-70B-Instruct
Implementation Strateg...
2026.03
3
3
2
4
3
2
gemini-2.5-pro
Implementation Strateg...
2026.03
2
2
1
5
1
3
gemini-2.5-pro
Implementation Strateg...
2026.03
1
1
4
2
2
1
Feedback
Search any
task
Search any
task