Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Cross-Scale and Cross-Domain Generalization on SpreadsheetBench and WikiTQ
Loading...
31.57
Average Score
Human-Written
-2.1988
6.5681
15.335
24.1019
Mar 26, 2026
Average Score
Updated 23d ago
Evaluation Results
Method
Method
Links
Average Score
Human-Written
Skill User=Qwen3.5-35B...
2026.03
31.57
Parametric
Skill User=Qwen3.5-35B...
2026.03
20.8
No Skill
Skill User=Qwen3.5-35B...
2026.03
18.35
Trace2Skill
Skill Author=Qwen3.5-3...
2026.03
18.26
Trace2Skill
Skill Author=Qwen3.5-1...
2026.03
17.62
Trace2Skill
Skill Author=Qwen3.5-1...
2026.03
14.96
Trace2Skill
Skill Author=Qwen3.5-3...
2026.03
14.78
Trace2Skill
Skill Author=Qwen3.5-3...
2026.03
11.69
Trace2Skill
Skill Author=Qwen3.5-3...
2026.03
9.85
Trace2Skill
Skill Author=Qwen3.5-1...
2026.03
9.19
Trace2Skill
Skill Author=Qwen3.5-1...
2026.03
9.18
Trace2Skill
Skill Author=Qwen3.5-1...
2026.03
7.04
Trace2Skill
Skill Author=Qwen3.5-3...
2026.03
4.54
Trace2Skill
Skill Author=Qwen3.5-3...
2026.03
4.47
Trace2Skill
Skill Author=Qwen3.5-1...
2026.03
-0.9
Feedback
Search any
task
Search any
task