Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Category Quality Scoring on MT-Bench
Loading...
9.25
Writing
16-bit Baseline
4.102
5.4385
6.775
8.1115
Nov 15, 2024
Writing
Roleplay
Reasoning
Math
Coding
Extraction
STEM
Humanities
Overall Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Writing
Roleplay
Reasoning
Math
Coding
Extraction
STEM
Humanities
Overall Score
16-bit Baseline
Data Format=16-bit Bas...
2024.11
9.25
7.2
4.65
2.55
3.3
5.55
8.93
9.58
6.38
AMXFP4
Data Format=AMXFP4, Mo...
2024.11
8.2
5.98
4.5
2.5
3.05
5.16
7.7
8.7
5.73
MXFP4
Data Format=MXFP4, Mod...
2024.11
7.2
7.03
3.95
1.7
1.7
4.35
7.53
8.53
5.25
MXFP4-PoT
Data Format=MXFP4-PoT,...
2024.11
4.3
4.05
2.35
1.9
1.25
1.55
5.23
5.15
3.22
Feedback
Search any
task
Search any
task