Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Program Generation on MultiHiertt
Loading...
82.9
Program Accuracy
TaNOS
50.1712
58.6681
67.165
75.6619
Apr 23, 2026
Program Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Program Accuracy
TaNOS
Supervision budget=100...
2026.04
82.9
TaNOS w/o SSL
Supervision budget=100...
2026.04
82.65
TaNOS
Supervision budget=10%...
2026.04
70.63
TaNOS w/o SSL
Supervision budget=10%...
2026.04
68.28
TaNOS w/o Sketch
Supervision budget=100...
2026.04
63.44
SFT
Supervision budget=100...
2026.04
61.96
SFT + SSL
Supervision budget=100...
2026.04
61.96
SFT + SSL
Supervision budget=10%...
2026.04
55.27
TaNOS w/o Sketch
Supervision budget=10%...
2026.04
54.77
SFT
Supervision budget=10%...
2026.04
51.43
Feedback
Search any
task
Search any
task