SOTA STRING benchmarks and papers with code | WizwandBenchmarks
| Dataset Name | SOTA Method | Metric | Trend | | |
|---|
| TableBench | A3 | LLM Judge Accuracy32 | | 9 | 23d ago |
| ShellOps | A3 | LLM Judge Accuracy49.1 | | 9 | 23d ago |
| EHRCon | A3 | LLM Judge Accuracy67.4 | | 9 | 23d ago |
| DataBench | A3 | LLM Judge Accuracy77.9 | | 9 | 23d ago |
| AB-DB | A3 | LLM Judge Accuracy28.7 | | 9 | 23d ago |
| AB-OS | GSPO | LLM Judge Accuracy61.4 | | 9 | 23d ago |
Showing 6 of 6 rows