Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context Reasoning on FRAMES
Loading...
83.5
Score
DocQA
69.772
73.336
76.9
80.464
May 29, 2026
Score
Updated 2d ago
Evaluation Results
Method
Method
Links
Score
DocQA
Backbone=Qwen3-30B-A3B...
2026.05
83.5
LoongRL
Backbone=Qwen3-30B-A3B...
2026.05
81.9
LONGTRACERL
Backbone=Qwen3-30B-A3B...
2026.05
81.9
LongRLVR
Backbone=Qwen3-30B-A3B...
2026.05
81.7
Base
Backbone=Qwen3-30B-A3B...
2026.05
80.7
LONGTRACERL-GRPO
Backbone=Qwen3-30B-A3B...
2026.05
79.6
LONGTRACERL
Backbone=Qwen3-4B-Thin...
2026.05
79.5
LongRLVR
Backbone=Qwen3-4B-Thin...
2026.05
78.5
DocQA
Backbone=Qwen3-4B-Thin...
2026.05
78.3
Base
Backbone=Qwen3-4B-Thin...
2026.05
76.7
LONGTRACERL-GRPO
Backbone=Qwen3-4B-Thin...
2026.05
76.1
LoongRL
Backbone=Qwen3-4B-Thin...
2026.05
75.8
LONGTRACERL
Backbone=DeepSeek-R1-0...
2026.05
74.3
DocQA
Backbone=DeepSeek-R1-0...
2026.05
73.4
Base
Backbone=DeepSeek-R1-0...
2026.05
73.2
LONGTRACERL-GRPO
Backbone=DeepSeek-R1-0...
2026.05
73.1
LoongRL
Backbone=DeepSeek-R1-0...
2026.05
72.6
LongRLVR
Backbone=DeepSeek-R1-0...
2026.05
70.3
Feedback
Search any
task
Search any
task