Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Script Confusion Mitigation on FLEURS sr-latn (test)
Loading...
96
Accuracy (Normalized Edit Similarity)
steer
61.68
70.59
79.5
88.41
Jan 6, 2026
Accuracy (Normalized Edit Similarity)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy (Normalized Edit Similarity)
steer
Model Size=large
2026.01
96
steer
Model Size=v2
2026.01
96
steer
Model Size=v3
2026.01
96
prompt
Model Size=large
2026.01
95
prompt
Model Size=v2
2026.01
95
prompt
Model Size=v3
2026.01
95
no-prompt
Model Size=large
2026.01
93
no-prompt
Model Size=v3
2026.01
93
prompt
Model Size=medium
2026.01
93
steer
Model Size=medium
2026.01
93
no-prompt
Model Size=small
2026.01
90
prompt
Model Size=small
2026.01
90
steer
Model Size=small
2026.01
89
no-prompt
Model Size=base
2026.01
81
no-prompt
Model Size=v2
2026.01
81
prompt
Model Size=base
2026.01
81
steer
Model Size=base
2026.01
80
no-prompt
Model Size=tiny
2026.01
66
prompt
Model Size=tiny
2026.01
64
steer
Model Size=tiny
2026.01
64
no-prompt
Model Size=medium
2026.01
63
Feedback
Search any
task
Search any
task