Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on Ag-LiveCodeBench-X
Loading...
25
Lua Pass@1
Llama 3.3 70B Ins
0.04
6.52
13
19.48
Aug 6, 2025
Lua Pass@1
Julia Pass@1
R Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Lua Pass@1
Julia Pass@1
R Pass@1
Llama 3.3 70B Ins
Model=Llama 3.3 70B Ins
2025.08
25
22
13
Qwen3-8B-CF-X
Model=Qwen3-8B-CF-X
2025.08
25
25
19
Qwen3-4B-CF-X
Model=Qwen3-4B-CF-X
2025.08
23
22
15
Qwen 3 32B
Model=Qwen 3 32B
2025.08
22
26
17
Qwen3-4B-MBPP-X
Model=Qwen3-4B-MBPP-X
2025.08
15
15
9
DSC v2 Lite Ins 16B
Model=DSC v2 Lite Ins 16B
2025.08
13
12
9
Phi4-mini-ins-CF-X
Training=Agnostics (CF-X)
2025.08
12
8
12
Qwen 3 4B
Model=Qwen 3 4B
2025.08
11
10
10
Qwen 3 8B
Model=Qwen 3 8B
2025.08
11
9
9
DSC-6.7B-Ins-CF-X
Model Size=6.7B, Train...
2025.08
9
9
10
Phi 4 mini ins
Training=Instruction-t...
2025.08
8
8
5
DSC 6.7B Ins
Model Size=6.7B, Train...
2025.08
8
5
8
SmolLM3-3B-CF-X
Model Size=3B, Trainin...
2025.08
8
8
6
SmolLM3 3B
Model Size=3B, Trainin...
2025.08
1
2
2
Feedback
Search any
task
Search any
task