Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Financial workflow task execution on RealFin benchmark
Loading...
65
Accuracy
GA
33.8
41.9
50
58.1
Apr 18, 2026
Accuracy
Input Tokens (k)
Output Tokens (k)
Total Tokens (k)
Efficiency
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Input Tokens (k)
Output Tokens (k)
Total Tokens (k)
Efficiency
GA
Model=Claude Sonnet 4.6
2026.04
65
102
12
114
5.7
Claude Code
Model=Claude Opus 4.6
2026.04
60
290
17
307
1.95
Codex
Model=GPT-5.4
2026.04
60
838
54
892
0.67
Claude Code
Model=Claude Sonnet 4.6
2026.04
55
226
12
238
2.31
OpenClaw
Model=Claude Sonnet 4.6
2026.04
35
249
2
251
1.39
Feedback
Search any
task
Search any
task