Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Constraint-following Instruction Evaluation on IFEval
Loading...
54.4
Average Score
LLAMA3-INSTRUCT w/ ULTRAFEEDBACK
18.416
27.758
37.1
46.442
Oct 22, 2024
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
LLAMA3-INSTRUCT w/ ULTRAFEEDBACK
Backbone=LLAMA3-INSTRUCT
2024.10
54.4
LLAMA3-INSTRUCT w/ SSO_DPO
Backbone=LLAMA3-INSTRUCT
2024.10
53.4
LLAMA3-INSTRUCT w/ PBAA_DPO
Backbone=LLAMA3-INSTRUCT
2024.10
53.2
LLAMA3-INSTRUCT
Backbone=LLAMA3-INSTRUCT
2024.10
53
QWEN2-INSTRUCT w/ ULTRAFEEDBACK
Backbone=QWEN2-INSTRUCT
2024.10
51.5
QWEN2-INSTRUCT
Backbone=QWEN2-INSTRUCT
2024.10
51.4
QWEN2-INSTRUCT w/ SSO_DPO
Backbone=QWEN2-INSTRUCT
2024.10
51.4
QWEN2-INSTRUCT w/ PBAA_DPO
Backbone=QWEN2-INSTRUCT
2024.10
50.9
LLAMA3-SFT w/ SSO_DPO
Backbone=LLAMA3-SFT
2024.10
50.3
LLAMA3-SFT w/ PBAA_DPO
Backbone=LLAMA3-SFT
2024.10
47.8
QWEN2-SFT w/ SSO_DPO
Backbone=QWEN2-SFT
2024.10
45.7
LLAMA3-SFT w/ ULTRAFEEDBACK
Backbone=LLAMA3-SFT
2024.10
43.6
QWEN2-SFT w/ PBAA_DPO
Backbone=QWEN2-SFT
2024.10
43.6
QWEN2-SFT w/ ULTRAFEEDBACK
Backbone=QWEN2-SFT
2024.10
40.4
LLAMA3-SFT
Backbone=LLAMA3-SFT
2024.10
24.9
QWEN2-SFT
Backbone=QWEN2-SFT
2024.10
19.8
Feedback
Search any
task
Search any
task