Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OmniScore

Benchmarks

Task NameDataset NameSOTA ResultTrend
Subjective Rubric-based ScoringOmniScore overall (test)
MAE0.78
5
Multi-task ScoringOmniScore (Evaluation Set)
Average MA0.99
5
TranslationOmniScore Evaluation Set
MAE0.68
5
SummarizationOmniScore Evaluation Set
MAE0.91
5
Question AnsweringOmniScore Evaluation Set
MAE0.64
5
ParaphraseOmniScore Evaluation Set
MAE0.86
5
Headline GenerationOmniScore Evaluation Set
MAE0.6
5
Showing 7 of 7 rows