Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context reasoning on LongReason
Loading...
86.9
Score
LONGTRACERL-GRPO
72.652
76.351
80.05
83.749
May 29, 2026
Score
Updated 2d ago
Evaluation Results
Method
Method
Links
Score
LONGTRACERL-GRPO
Backbone=Qwen3-30B-A3B...
2026.05
86.9
DocQA
Backbone=Qwen3-30B-A3B...
2026.05
86.4
LONGTRACERL
Backbone=Qwen3-30B-A3B...
2026.05
85.4
LoongRL
Backbone=Qwen3-30B-A3B...
2026.05
84.9
Base
Backbone=Qwen3-30B-A3B...
2026.05
84.2
LongRLVR
Backbone=Qwen3-30B-A3B...
2026.05
84.2
LONGTRACERL
Backbone=Qwen3-4B-Thin...
2026.05
83.8
LongRLVR
Backbone=Qwen3-4B-Thin...
2026.05
80.7
DocQA
Backbone=Qwen3-4B-Thin...
2026.05
79.9
LoongRL
Backbone=Qwen3-4B-Thin...
2026.05
78.7
LONGTRACERL-GRPO
Backbone=Qwen3-4B-Thin...
2026.05
78.7
Base
Backbone=Qwen3-4B-Thin...
2026.05
78.5
LONGTRACERL-GRPO
Backbone=DeepSeek-R1-0...
2026.05
75.4
LONGTRACERL
Backbone=DeepSeek-R1-0...
2026.05
75.2
Base
Backbone=DeepSeek-R1-0...
2026.05
74.1
DocQA
Backbone=DeepSeek-R1-0...
2026.05
73.7
LoongRL
Backbone=DeepSeek-R1-0...
2026.05
73.3
LongRLVR
Backbone=DeepSeek-R1-0...
2026.05
73.2
Feedback
Search any
task
Search any
task