Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

L-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context language understandingL-Eval
Coursera58.28
26
Long-context language understandingL-Eval (test)
Coursera58.28
26
Long-context SummarizationL-Eval Sum
QMS22.66
13
Long-context Question AnsweringL-Eval QA
NQ80.73
13
Long-context evaluationL-Eval
Close Score68.8
13
Closed-ended Task EvaluationL-Eval closed-ended tasks
Coursera Score41.86
12
Prompt CompressionL-Eval (test)
Coursera QA Accuracy64.4
5
Long-context Question AnsweringL-Eval
Coursera QA30.2
4
Showing 8 of 8 rows