| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HellaSwag | COLLATE | Accuracy99.21 | 1,891 | 10d ago | |
| WinoGrande | GPTQ | Accuracy7,364 | 1,085 | 2d ago | |
| PIQA | Accuracy94.9 | 751 | 8d ago | ||
| Winogrande | Accuracy85.3 | 372 | 4d ago | ||
| CSQA | Token-ICS | Accuracy96 | 366 | 1mo ago | |
| HellaSwag | LLaMA-2 70B | HellaSwag Accuracy84 | 350 | 4d ago | |
| Common Sense Reasoning Tasks | Dual LoRA | Avg Score93 | 316 | 4d ago | |
| Commonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA) (test) | PaLM (540B) | BoolQ Accuracy88 | 202 | 3d ago | |
| ARC Challenge | Self-Debias Offline | Accuracy93.8 | 190 | 8d ago | |
| StrategyQA | REBALANCE | Accuracy95.7 | 174 | 22d ago | |
| ARC-C | Accuracy96.3 | 172 | 3d ago | ||
| CommonSenseQA | Accuracy91.2 | 136 | 1mo ago | ||
| Commonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA) | FFA-LoRA | BoolQ Accuracy82.88 | 129 | 8d ago | |
| CSQA | CSQA Accuracy91.2 | 126 | 9d ago | ||
| OBQA | HydraLoRA | Accuracy89.2 | 117 | 8d ago | |
| SocialIQA | Accuracy88.1 | 116 | 1mo ago | ||
| CSQA (test) | KEAR | Accuracy89.4 | 111 | 1mo ago | |
| ARC-E | Self-consistency | Accuracy96.4 | 106 | 11d ago | |
| SIQA | In-Squeeze | Accuracy89.85 | 106 | 10d ago | |
| Wino | Accuracy77.4 | 102 | 9d ago | ||
| WinoGrande (val) | Accuracy73.88 | 87 | 1mo ago | ||
| StrategyQA (test) | SGE | Accuracy83.49 | 81 | 4d ago | |
| Winogrande | FLAP | Accuracy76.09 | 78 | 10d ago | |
| Average 7 Commonsense Reasoning Tasks | Avg Accuracy72.04 | 72 | 15d ago | ||
| OpenBookQA | Accuracy91 | 71 | 8d ago |