Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OOL-Pairs

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context reasoningOOL-Pairs
Latency (s)5.1
27
Long-context reasoning (Pairs)OOL-Pairs
Accuracy64.3
27
Showing 2 of 2 rows