| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HellaSwag | COLLATE | Accuracy99.21 | 1,896 | 22d ago | |
| WinoGrande | GPTQ | Accuracy7,364 | 1,442 | 18h ago | |
| PIQA | Accuracy94.9 | 757 | 16d ago | ||
| HellaSwag | MSRS | HellaSwag Accuracy87.4 | 711 | 17h ago | |
| Winogrande | Accuracy85.3 | 453 | 5d ago | ||
| CSQA | Token-ICS | Accuracy96 | 366 | 3mo ago | |
| Common Sense Reasoning Tasks | Dual LoRA | Avg Score93 | 321 | 25d ago | |
| ARC Challenge | Self-Debias Offline | Accuracy93.8 | 243 | 20d ago | |
| Commonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA) (test) | PaLM (540B) | BoolQ Accuracy88 | 238 | 4d ago | |
| Commonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA) | LoRA-GA | BoolQ Accuracy89.69 | 223 | 21h ago | |
| ARC-C | Accuracy96.3 | 215 | 19d ago | ||
| PIQA | BaLoRA | Accuracy89.99 | 213 | 20h ago | |
| StrategyQA | REBALANCE | Accuracy95.7 | 208 | 19d ago | |
| CSQA | CSQA Accuracy91.2 | 195 | 7d ago | ||
| OBQA | HydraLoRA | Accuracy89.2 | 187 | 14d ago | |
| SIQA | In-Squeeze | Accuracy89.85 | 168 | 20h ago | |
| SocialIQA | Accuracy88.1 | 158 | 7d ago | ||
| ARC-E | Self-consistency | Accuracy96.4 | 152 | 14d ago | |
| Wino | Accuracy86.55 | 146 | 21d ago | ||
| CommonSenseQA | Accuracy91.2 | 136 | 3mo ago | ||
| StrategyQA (test) | SGE | Accuracy83.49 | 119 | 4d ago | |
| CSQA (test) | KEAR | Accuracy89.4 | 111 | 3mo ago | |
| CommonsenseQA | DENOISE | Accuracy (pass@1)86.6 | 108 | 29d ago | |
| OpenBookQA | Accuracy91.2 | 108 | 21d ago | ||
| Winogrande | FLAP | Accuracy76.09 | 103 | 21d ago |