Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Summary

Benchmarks

Task NameDataset NameSOTA ResultTrend
Preference Profile EstimationSummary
Misprediction Rate0.002
24
Text SummarizationSummary
LLM-as-judge Score44.4
13
SummarizationSummary
Score46.4
13
SummarizationSummary (test)
Score41.31
5
Showing 4 of 4 rows