Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Physical Commonsense Reasoning on PIQA (Relative Improvement Metrics)
Loading...
1.89
Average Relative Improvement
TBDF
-0.6372
0.0189
0.675
1.3311
Jan 29, 2026
Average Relative Improvement
Inferior/Superior Counts
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Relative Improvement
Inferior/Superior Counts
TBDF
Filtering Mode=General...
2026.01
1.89
-
TBDF
Filtering Mode=FW-EDU,...
2026.01
1.35
-
CB
Filtering Mode=FW-EDU,...
2026.01
-0.54
-
Feedback
Search any
task
Search any
task