| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Task Category A | Mistral-7B-Instruct-v0.3 | Functionally Diverse Responses Count3.83 | 35 | 1mo ago | |
| Functional Diversity Prompt Set Categories A-H 1.0 (test) | GRPO w/DARLING | Score Category A4.26 | 21 | 1mo ago | |
| Task Category B | gpt-4o | Functionally Diverse Responses Count5 | 6 | 1mo ago | |
| Task Category D | Mistral-7B-Instruct-v0.3 | Functionally Diverse Responses4.44 | 4 | 1mo ago | |
| Task Category C | gpt-4o | Functionally Diverse Responses Count5 | 4 | 1mo ago | |
| Task Category E | claude-3.5-sonnet | Functionally Diverse Responses4.94 | 3 | 1mo ago | |
| Task Category H | Mistral-7B-Instruct-v0.3 | Functionally Diverse Responses4.42 | 2 | 1mo ago | |
| Task Category G | claude-3.5-sonnet | Functionally Diverse Responses Count4.62 | 2 | 1mo ago | |
| Task Category F | Llama-3.1-8B-Instruct | Functionally Diverse Responses Count4.13 | 2 | 1mo ago |