Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Matched Quality Evaluation Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context language modelingMatched Quality Evaluation Suite 128K
Q Score (Dense)80.12
3
Long-context language modelingMatched Quality Evaluation Suite (32K)
Q* Score (Dense)79.39
3
Long-context language modelingMatched Quality Evaluation Suite 8K
Q* Score (Dense)81.35
3
Showing 3 of 3 rows