Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

No Mean Feat: Simple, Strong Baselines for Context Compression

About

Context compression reduces Transformer inference costs by replacing lengthy inputs with shorter pre-computed representations. It carries significant benefits for retrieval-augmented generation (RAG) and has attracted growing research attention. However, progress remains difficult to measure due to inconsistent evaluations and baselines. We design a standard, easy-to-reproduce evaluation suite for context compression, BenchPress, along with simple, high-performance baselines for English reading comprehension. BenchPress supports benchmarking across model scales, datasets, compression ratios, and short ($<$1K tokens) to mid-range ($<$8K tokens) contexts. While the suite is applicable to any compression paradigm, our baselines target soft context compression. We establish two simple baselines that strongly outperform the widely used causal compression-token approach: mean pooling and a bidirectional compression-token variant. Our results show the benefit of bidirectional attention when computing compressed representations, and that simple pooling is an expressive compression operator.

Yair Feldman, Yoav Artzi• 2025

Related benchmarks

TaskDatasetResultRank
Language Model EvaluationBenchPress short-context (test)
Accuracy65.91
131
Context Compression EvaluationBenchPress suite macro-averaged across all datasets
Macro-averaged F171.66
130
Context CompressionBenchPress short-context (test)
EM (4x Single Context)56.41
21
Multi-Doc Question AnsweringLongBench-E Multi-Doc QA
F1 Score45.9
17
Single-Doc Question AnsweringLongBench-E Single-Doc QA
F1 Score39.7
17
Showing 5 of 5 rows

Other info

Follow for update