Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task Language Understanding on MMLU (ACC, F1)
Loading...
92
MMLU Accuracy
DIRECT QUERY
34.8
49.65
64.5
79.35
Dec 30, 2023
MMLU Accuracy
MMLU F1 Score
Updated 9d ago
Evaluation Results
Method
Method
Links
MMLU Accuracy
MMLU F1 Score
DIRECT QUERY
PRIVATE=NO, Backbone=G...
2023.12
92
92
DIRECT QUERY
PRIVATE=NO, Backbone=G...
2023.12
89
71
CONFUSIONPROMPT
PRIVATE=YES, Backbone=...
2023.12
89
90
CONFUSIONPROMPT
PRIVATE=YES, Backbone=...
2023.12
83
82
DIRECT QUERY
PRIVATE=NO, Backbone=G...
2023.12
74
74
CONFUSIONPROMPT
PRIVATE=YES, Backbone=...
2023.12
71
71
VICUNA-13B
PRIVATE=YES
2023.12
60
60
PARAPHRASER
PRIVATE=YES, Backbone=...
2023.12
59
58
PARAPHRASER
PRIVATE=YES, Backbone=...
2023.12
58
57
LLAMA2-7B
PRIVATE=YES
2023.12
56
54
TEXT2TEXT
PRIVATE=YES, Backbone=...
2023.12
48
47
PARAPHRASER
PRIVATE=YES, Backbone=...
2023.12
47
46
TEXT2TEXT
PRIVATE=YES, Backbone=...
2023.12
45
45
TEXT2TEXT
PRIVATE=YES, Backbone=...
2023.12
37
36
Feedback
Search any
task
Search any
task