Share your thoughts, 1 month free Claude Pro on usSee more

STRING on ShellOps

49.1LLM Judge Accuracy

A3

Updated 2mo ago

Evaluation Results

Method	Links
A3 2026.05		49.1
A3 2026.05		49
LATS 2026.05		28.8
ReACT 2026.05		28.3
GSPO 2026.05		27.7
GiGPO 2026.05		27
HGPO 2026.05		23.9
RetroAgent 2026.05		20
rStar 2026.05		18.6