Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Context Compression Evaluation on BenchPress suite macro-averaged across all datasets

74.33Macro-averaged F1

Qwen3-8B

40.259649.104857.9566.7952Oct 23, 2025
Updated 22d ago

Evaluation Results

MethodLinks
2025.10
74.33-
2025.10
73.44-
2025.10
71.96-
2025.10
71.66-
2025.10
70.55-
2025.10
70.39-
2025.10
69.93-
2025.10
69.57-
2025.10
69.36-
2025.10
69.33-
2025.10
69.2-
2025.10
68.15-
2025.10
68.09-
2025.10
67.03-
2025.10
66.72-
2025.10
66.43-
2025.10
65.9-
2025.10
65.82-
2025.10
65.36-
2025.10
65.24-
2025.10
64.88-
2025.10
64.76-
2025.10
64.67-
2025.10
64.17-
2025.10
63.85-
2025.10
63.35-
2025.10
63.01-
2025.10
62.98-
2025.10
62.81-
2025.10
62.6-
2025.10
62.53-
2025.10
62.18-
2025.10
62.08-
2025.10
62.04-
2025.10
61.79-
2025.10
61.72-
2025.10
61.39-
2025.10
61.17-
2025.10
61.04-
2025.10
60.56-
2025.10
60.48-
2025.10
60.27-
2025.10
58.43-
2025.10
58.41-
2025.10
58.36-
2025.10
57.91-
2025.10
57.73-
2025.10
57.68-
2025.10
57.52-
2025.10
57.03-
2025.10
56.39-
2025.10
56.31-
2025.10
56.21-
2025.10
55.59-
2025.10
55.43-
2025.10
55.22-
2025.10
55.07-
2025.10
54.7-
2025.10
54.47-
2025.10
54.4-
2025.10
54.28-
2025.10
54.11-
2025.10
53.74-
2025.10
51.85-
2025.10
51.56-
2025.10
51.53-
2025.10
51.3-
2025.10
50.9-
2025.10
50.06-
2025.10
49.83-
2025.10
49.37-
2025.10
49.2-
2025.10
48.68-
2025.10
47.9-
2025.10
47.64-
2025.10
47.62-
2025.10
47.59-
2025.10
47.51-
2025.10
47.47-
2025.10
47.28-
2025.10
46.97-
2025.10
46.96-
2025.10
46.93-
2025.10
45.92-
2025.10
44.98-
2025.10
44.82-
2025.10
44.76-
2025.10
44.73-
2025.10
44.46-
2025.10
43.71-
2025.10
43.62-
2025.10
43.17-
2025.10
43.08-
2025.10
42.66-
2025.10
42.59-
2025.10
42.52-
2025.10
42.49-
2025.10
42.4-
2025.10
41.61-
2025.10
41.57-
Showing 100 of 130 rows