Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on Code (test)
Loading...
36.5
Accuracy
Non-hacking
14.556
20.253
25.95
31.647
Apr 17, 2026
Accuracy
RH Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
RH Accuracy
Non-hacking
Base Model=Qwen2.5-3B-...
2026.04
36.5
-
RFT+GRIFT
Base Model=Qwen2.5-3B-...
2026.04
23.3
32.8
Starting-point
Base Model=Qwen2.5-3B-...
2026.04
19.6
49.5
No-intervention
Base Model=Qwen2.5-3B-...
2026.04
16.2
81.9
RFT+Random
Base Model=Qwen2.5-3B-...
2026.04
15.9
54.6
RFT+Trace
Base Model=Qwen2.5-3B-...
2026.04
15.4
67.2
Feedback
Search any
task
Search any
task