Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Restricted code system evaluation on ACI
Loading...
74
F1 Score
Symphony
23.04
36.27
49.5
62.73
Mar 31, 2026
F1 Score
Recall
Precision
Updated 17d ago
Evaluation Results
Method
Method
Links
F1 Score
Recall
Precision
Symphony
Runs=5
2026.03
74
81
68
Claude
Method Category=Fine-t...
2026.03
58
-
-
MedDCR
Method Category=Workfl...
2026.03
52
67
43
CoT-SC
Method Category=Agent
2026.03
44
59
36
PLM-CA
Method Category=Fine-tune
2026.03
43
42
44
ADAS
Method Category=Agent
2026.03
43
59
28
PLM-ICD
Method Category=Fine-tune
2026.03
42
41
43
CoT
Method Category=Agent
2026.03
41
50
35
RRS
Method Category=Workflow
2026.03
35
52
26
Judge
Method Category=Agent
2026.03
33
64
22
MAC
Method Category=Workflow
2026.03
31
50
23
MulDe
Method Category=Agent
2026.03
25
65
16
Feedback
Search any
task
Search any
task