Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Constraint Satisfaction on Short-generation task
Loading...
74.3
Accuracy
ICFA
44.036
51.893
59.75
67.607
Dec 16, 2025
Accuracy
Latency (ms)
Hallucination Rate
Updated 3mo ago
Evaluation Results
Method
Method
Links
Accuracy
Latency (ms)
Hallucination Rate
ICFA
Samples=16 (eff.)
2025.12
74.3
135
6.8
PPO (RLHF)
Samples=> 10^5 (train)
2025.12
68.5
140
10.2
Best-of-N
N=100, Samples=100
2025.12
62.1
850
12
Beam Search
k=5, Samples=N/A
2025.12
45.2
120
18.5
Feedback
Search any
task
Search any
task