Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Language Model Evaluation on AdaptEval
Loading...
0.2733
ROUGE-Lsum
SCALENET (Layer-wise)
-0.007916
0.065092
0.1381
0.211108
Feb 10, 2026
ROUGE-Lsum
Updated 4d ago
Evaluation Results
Method
Method
Links
ROUGE-Lsum
SCALENET (Layer-wise)
LLM Backbone=Llama70B-...
2026.02
0.2733
Base Model
LLM Backbone=Llama70B-...
2026.02
0.2327
Naive TTA
LLM Backbone=Llama70B-...
2026.02
0.2325
SCALENET (Layer-wise)
LLM Backbone=Llama70B-...
2026.02
0.2237
SCALENET (Step-wise)
LLM Backbone=Llama70B-...
2026.02
0.2223
SCALENET (Step-wise)
LLM Backbone=Llama70B-...
2026.02
0.2068
SCALENET (Layer-wise)
LLM Backbone=Qwen32B,...
2026.02
0.1043
Naive TTA
LLM Backbone=Qwen32B,...
2026.02
0.1004
Naive TTA
LLM Backbone=Qwen32B,...
2026.02
0.0992
SCALENET (Step-wise)
LLM Backbone=Qwen32B,...
2026.02
0.099
Base Model
LLM Backbone=Qwen32B,...
2026.02
0.0987
SCALENET (Layer-wise)
LLM Backbone=Qwen32B,...
2026.02
0.0983
SCALENET (Step-wise)
LLM Backbone=Qwen32B,...
2026.02
0.0982
Naive TTA
LLM Backbone=Llama70B-...
2026.02
0.0029
Feedback
Search any
task
Search any
task