Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Legal Reasoning on LegalBench (accuracy)
Loading...
90
Accuracy
evaluation-instructed prompt optimization
53.6
63.05
72.5
81.95
Nov 25, 2025
Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
evaluation-instructed prompt optimization
Backbone=GPT-4o, Optim...
2025.11
90
Pro-Refine
Backbone=GPT-4o, Optim...
2025.11
86
TextGrad
Backbone=GPT-4o, Optim...
2025.11
84
APE
Backbone=GPT-4o, Optim...
2025.11
84
LLM only
Backbone=GPT-4o, Optim...
2025.11
83
Self-Refine
Backbone=GPT-4o, Optim...
2025.11
81
evaluation-instructed prompt optimization
Backbone=LLaMA-3, Opti...
2025.11
70
evaluation-instructed prompt optimization
Backbone=LLaMA-3.1, Op...
2025.11
69
Self-Refine
Backbone=LLaMA-3, Opti...
2025.11
63
Pro-Refine
Backbone=LLaMA-3, Opti...
2025.11
63
Self-Refine
Backbone=LLaMA-3.1, Op...
2025.11
63
Pro-Refine
Backbone=LLaMA-3.1, Op...
2025.11
63
APE
Backbone=LLaMA-3.1, Op...
2025.11
61
TextGrad
Backbone=LLaMA-3, Opti...
2025.11
58
TextGrad
Backbone=LLaMA-3.1, Op...
2025.11
58
LLM only
Backbone=LLaMA-3.1, Op...
2025.11
56
LLM only
Backbone=LLaMA-3, Opti...
2025.11
55
APE
Backbone=LLaMA-3, Opti...
2025.11
55
Feedback
Search any
task
Search any
task