Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OOLONG-REAL

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context reasoningOOLONG-REAL Average 650 samples
Average Reward0.32
4
Long-context reasoningOOLONG-REAL 650 samples (55K bucket)
Average Reward45.4
2
Showing 2 of 2 rows