Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Conflict Resolution on CALCONFLICTBENCH 1.0 (test)

0.3Avg Error Rate (N=1)

Qwen3-8B

0.27280.45640.640.8236Jan 17, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
0.30.380.360.370.370.760.780.790.790.790.026
2026.01
0.30.420.410.430.410.850.770.780.770.780.122
2026.01
0.30.40.390.40.380.840.790.790.790.810.088
2026.01
0.340.390.390.390.380.790.790.790.780.780.069
2026.01
0.340.40.390.390.390.780.780.790.790.80.007
2026.01
0.360.380.340.360.350.80.790.810.810.820.161
2026.01
0.360.370.390.390.40.840.810.810.80.79-0.162
2026.01
0.380.420.410.40.410.820.750.750.740.75-0.039
2026.01
0.40.450.460.460.450.720.720.720.720.720.05
2026.01
0.420.390.360.360.350.830.810.820.820.830.092
2026.01
0.440.460.440.450.450.730.730.750.750.76-0.029
2026.01
0.660.660.670.650.650.580.580.60.610.62-0.027
2026.01
0.9811110.010000-0.004