Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Online auditing on Who&When
Loading...
57.69
Step Accuracy
AgentForesight-7B
4.9516
18.6433
32.335
46.0267
May 9, 2026
Step Accuracy
Agent Accuracy
ASS
Updated 22d ago
Evaluation Results
Method
Method
Links
Step Accuracy
Agent Accuracy
ASS
AgentForesight-7B
2026.05
57.69
73.08
1.62
GPT-4.1
2026.05
38.1
66.67
2.38
DeepSeek-V4-Flash
2026.05
37.21
65.12
2.35
Qwen2.5-7B-Instruct
2026.05
36.59
58.54
2.41
Gemini-3-Flash
2026.05
32.56
53.49
2.47
Qwen3-8B
2026.05
29.41
55.88
2.79
Llama3.2-3B
2026.05
28.57
47.62
2.57
Gemma3-4B
2026.05
6.98
18.6
3.09
Feedback
Search any
task
Search any
task