| CLOTH | DCDG + ASDE, DDDE | Hardest Accuracy73.25 | | 18 | 14d ago |
| MCQ (test) | Text2text (RAP-T5) | P@122.39 | | 17 | 3mo ago |
| Sciq (test) | Text2text (RAP-T5) | Precision@124.3 | | 15 | 3mo ago |
| MedQA | GPT-3(few-shot-COT-k-NN) | P@121.05 | | 12 | 1mo ago |
| ARC Challenge | GPT-3(few-shot-random) | P@121.08 | | 12 | 1mo ago |
| ARC Easy | GPT-3(few-shot-k-NN) | P@124.24 | | 12 | 1mo ago |
| MCQL | GPT-3(few-shot-COT-k-NN) | P@136.17 | | 12 | 1mo ago |
| SciQ | GPT-3(few-shot-COT-k-NN) | P@125.5 | | 12 | 1mo ago |
| MCQ | GPT-3(few-shot-COT-k-NN) | P@130.5 | | 12 | 1mo ago |
| Discrete_40 | MCTS-guided reasoning reconstruction framework | Plausibility3.25 | | 10 | 1mo ago |
| Human Evaluation Set (test) | GPT-3 | Relevance4.14 | | 7 | 1mo ago |
| RACE | | Accuracy88.1 | | 7 | 3mo ago |
| CLOTH (test) | DCDG | Invalid Ratio (Easy)0.1 | | 6 | 14d ago |
| CLOTH original (test) | CDGP | F1@1015.37 | | 6 | 14d ago |
| RACE (test) | BDG_PM | BLEU-139.81 | | 6 | 3mo ago |
| CLOTH Hard | Qwen 2.5 7B | Invalid Ratio4.2 | | 5 | 14d ago |
| CLOTH Easy | Qwen 2.5 7B | Invalid Ratio0 | | 5 | 14d ago |
| MCQ dataset | RAP-T5 | Relevance4.45 | | 5 | 3mo ago |
| CLOTH Hard Augmented (test) | DCDG with ASDE + DDDE | F1@1041.98 | | 4 | 14d ago |
| CLOTH Easy augmented (test) | DCDG with ASDE + DDDE | F1@1026.64 | | 4 | 14d ago |
| D-GEN Mathematics | D-GEN | Fluency5 | | 1 | 3mo ago |
| D-GEN Struct-to-Text | D-GEN | Fluency4.88 | | 1 | 3mo ago |
| D-GEN Summarization | D-GEN | Fluency4.96 | | 1 | 3mo ago |
| D-GEN Translation | D-GEN | Fluency4.91 | | 1 | 3mo ago |
| D-GEN RC + CS | D-GEN | Fluency4.99 | | 1 | 3mo ago |