Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BenchPress

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language Model EvaluationBenchPress short-context (test)
Accuracy68.84
131
Context Compression EvaluationBenchPress suite macro-averaged across all datasets
Macro-averaged F174.33
130
Context CompressionBenchPress short-context (test)
EM (4x Single Context)56.41
21
Showing 3 of 3 rows