| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Faithfulness detection | In-domain Step-level Benchmark Agent | FF180.2 | 10 | |
| Faithfulness detection | In-domain Step-level Benchmark Knowledge | FF183.4 | 10 | |
| Faithfulness detection | In-domain Step-level Benchmark Reasoning | FF184.5 | 10 |