Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Many-shot in-context learning on Long-context benchmarks

74.2ICL Performance (8k Context)

FlexPrefill

37.847.2556.766.15Mar 18, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
74.278.582.384.285.2
2026.03
73.679.481.784.587.4
2026.03
71.479.683.58788.6
2026.03
71.275.880.283.785.1
2026.03
70.874.779.182.885.4
2026.03
70.2757881.281.8
2026.03
70.175.178.478.571.5
2026.03
7075.678.781.282.4
2026.03
66.67174.278.278.6
2026.03
65.568.970.168.89.3
2026.03
65.468.570.572.976.2
2026.03
64.467.374.480.285.7
2026.03
61.86771.576.777
2026.03
61.56565.465.365
2026.03
54.764.869.874.276.3
2026.03
54.665.872.37780.3
2026.03
54.565.269.468.143.6
2026.03
54.465.270.273.674.4
2026.03
54.262.468.873.872.8
2026.03
49.560.266.272.377.4
2026.03
39.2536368.972