Share your thoughts, 1 month free Claude Pro on usSee more

Long-context language tasks (MC, QA, Sum) on ∞Bench

78.6MC Accuracy

RAG

Updated 4mo ago

Evaluation Results

Method	Links
RAG 2026.01		78.6	29.3	25.1
RR+Judge(y) 2026.01		72.5	30.7	31.7
Base 2026.01		72	30	31.2
RID(y) 2026.01		71.6	34.7	29.7
RID+C(y) 2026.01		70.7	31.3	30.2
RID+Q(y) 2026.01		70.7	30.8	30.5
RAO(y) 2026.01		70.3	38.2	31
RAO(y) 2026.01		65.5	33.6	28
RID(y) 2026.01		64.2	29.3	26
RR+Judge(y) 2026.01		63.8	28.2	28.8
RID+C(y) 2026.01		63.3	27.1	27.7
RID+Q(y) 2026.01		62.9	27.1	28.1
Base 2026.01		62	27.6	27.3