Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Data Analysis on Data Analysis
Loading...
96
Correctness
LATTE
34.744
50.647
66.55
82.453
Jan 27, 2026
Feb 12, 2026
Mar 1, 2026
Mar 18, 2026
Apr 3, 2026
Apr 20, 2026
May 7, 2026
Correctness
Code Style & Visualization
Robustness & Data Safety
Efficiency
Tokens (K)
Latency (min)
Updated 26d ago
Evaluation Results
Method
Method
Links
Correctness
Code Style & Visualization
Robustness & Data Safety
Efficiency
Tokens (K)
Latency (min)
LATTE
2026.05
96
-
-
-
122
3.2
Leader-Worker
2026.05
94
-
-
-
257
4.9
Decentralized
2026.05
93
-
-
-
271
2.9
Static
2026.05
88
-
-
-
403
6.2
MetaGPT
2026.05
75
-
-
-
390
8.7
FrugalGPT
2026.01
37.8
26.8
16.3
10
-
-
CASTER
2026.01
37.1
26.3
16.4
10
-
-
Feedback
Search any
task
Search any
task