Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Deep search QA on Frames
Loading...
46.42
Accuracy
ProCeedRL
31.34
35.255
39.17
43.085
Apr 2, 2026
Accuracy
Updated 16d ago
Evaluation Results
Method
Method
Links
Accuracy
ProCeedRL
Backbone=Qwen3-8B
2026.04
46.42
DAPO/Search-R1
Backbone=Qwen3-8B
2026.04
43.59
RFT
Backbone=Qwen3-8B
2026.04
40.65
Rewinding More
Backbone=Qwen3-8B, Typ...
2026.04
40.13
Qwen3-8B-v3-SFT
Backbone=Qwen3-8B
2026.04
37.5
ReAct Prompting
Backbone=Qwen3-8B
2026.04
31.92
Feedback
Search any
task
Search any
task