Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
RL Training on LongReason
Loading...
80
Peak Memory (GB)
DualKV
78.96
85.98
93
100.02
May 14, 2026
Peak Memory (GB)
Policy Update Time (s)
Old Log Prob Time (s)
Reference Log Prob Time (s)
Rollout Time (s)
Step Time (s)
Validation Accuracy
MFU (%)
Reward (Step 25)
Updated 16d ago
Evaluation Results
Method
Method
Links
Peak Memory (GB)
Policy Update Time (s)
Old Log Prob Time (s)
Reference Log Prob Time (s)
Rollout Time (s)
Step Time (s)
Validation Accuracy
MFU (%)
Reward (Step 25)
DualKV
mb/GPU=4
2026.05
80
313
63
-
231
606
-
59.3
85.3
FA2
mb/GPU=8, SP=4
2026.05
92
4,115
447
507
214
5,284
80.2
-
-
DualKV
mb/GPU=8
2026.05
93
242
63
-
237
542
-
77.4
86.8
DualKV
mb/GPU=8, SP=1
2026.05
103
1,078
129
142
215
1,564
77.4
-
-
FA2
mb/GPU=4
2026.05
106
597
153
-
236
987
-
30.9
84.8
FA3
mb/GPU=4
2026.05
106
543
142
-
234
919
-
34
85.9
Feedback
Search any
task
Search any
task