Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Text-to-SQL on BIRD-147 2023 (test)
Loading...
97.3
SRR
APEX MAS
70.78
77.665
84.55
91.435
Apr 2, 2026
SRR
NSR
NSP
NSF
EX
Updated 16d ago
Evaluation Results
Method
Method
Links
SRR
NSR
NSP
NSF
EX
APEX MAS
Base Model=GPT-4o, Tim...
2026.04
97.3
99.4
39
52.8
70.7
APEX MAS
Base Model=Sonnet, Tim...
2026.04
93.9
98.3
47
60.8
69.4
MAS Compiler
Time (s)=64, Cost ($)=...
2026.04
79.6
93.6
94.8
93.7
67.4
Adaptive Skill
Time (s)=55, Cost ($)=...
2026.04
77.6
92.7
95.3
93.4
68
Tools Only
Time (s)=27, Cost ($)=...
2026.04
75.5
92.5
94.7
92.9
66
Knowledge Only
Time (s)=12, Cost ($)=...
2026.04
74.2
92.5
94.9
93.1
66.7
Claude Code Raw
Time (s)=48, Cost ($)=...
2026.04
71.8
91.8
90
90.5
59.2
Feedback
Search any
task
Search any
task