Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-tool calling on SEAL-Tools
Loading...
86.5
TSA
JTPRO
80.78
82.265
83.75
85.235
Apr 20, 2026
TSA
SFA
OSR
Updated 1mo ago
Evaluation Results
Method
Method
Links
TSA
SFA
OSR
JTPRO
Backbone=GPT-5
2026.04
86.5
65.2
33.6
GEPA
Backbone=GPT-5
2026.04
85.2
61.3
31.1
Base
Backbone=GPT-5
2026.04
84.5
56.3
28.8
JTPRO
Backbone=o3-mini
2026.04
83.9
60.1
30.1
JTPRO
Backbone=GPT-4o
2026.04
82.3
53.51
27.5
Base
Backbone=o3-mini
2026.04
82.2
52.3
26.3
GEPA
Backbone=GPT-4o
2026.04
82.11
43.51
24.51
GEPA
Backbone=o3-mini
2026.04
82.1
56.9
27.5
Base
Backbone=GPT-4o
2026.04
81
40.4
23
Feedback
Search any
task
Search any
task