Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Use on Causal and Downstream Robustness Ablation Suite Averaged over 4 models
Loading...
4.1
Tool Hit@1Δ
HETA
0.46
1.405
2.35
3.295
Apr 14, 2026
Tool Hit@1Δ
Updated 2d ago
Evaluation Results
Method
Method
Links
Tool Hit@1Δ
HETA
Method Variant=Full
2026.04
4.1
HETA
Method Variant=LR+WIN
2026.04
3.8
HETA
Method Variant=w/o Hes...
2026.04
2.7
HETA
Method Variant=w/o KL
2026.04
2.3
ReAGent
Method Variant=Standard
2026.04
2.1
HETA
Method Variant=w/o Tra...
2026.04
1.9
SEA-CoT
Method Variant=Standard
2026.04
1.8
Progressive Inference
Method Variant=Standard
2026.04
1.6
fAML
Method Variant=Standard
2026.04
1.5
ContextCite
Method Variant=Standard
2026.04
1.4
TDD-backward
Method Variant=Standard
2026.04
1.3
Peering (PML)
Method Variant=Standard
2026.04
1.2
Integrated Gradients
Method Variant=Standard
2026.04
1.1
Attention Rollout
Method Variant=Standard
2026.04
0.6
Feedback
Search any
task
Search any
task