Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OOLONG

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context reasoningOOLONG
Accuracy68.4
37
Long-context reasoningOOLONG trec_coarse
Score86.6
28
Long-context reasoningOOLONG
Latency (s)7.1
27
Long-Context ReasoningOolong-Synth
Accuracy78.41
11
Long-context Question AnsweringOolong Real
Score37.46
9
Long-context Question AnsweringOolong Synthetic
Score71.75
8
Showing 6 of 6 rows