Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM Arbitration on PAVE Dimension 1 Counterfactual Setting v1 (test)

0.661Margin

Llama3-8B

-0.022280.155110.33250.50989May 31, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.05
0.66129.9532.492.33946.38339.962.0314.8761.2518.830.88.230.44526.7340.164.830.67212.0238.7525.69
0.52140.2336.91.48661.7325.331.332.9481.7863.5125.2824.738.390.32933.9257.591.480.3452.5736.4926.62
0.23571.717.190.39560.7544.051.911.273.4264.1745.0640.7513.10.68732.1566.992.472.033.6835.8320.26
2026.05
0.1971.591.550.3975.4555.764.110.7939.2914.749.0826.360.180.3580.6827.2623.060.37584.5885.2662.02
2026.05
0.11550.5941.790.97784.5760.821.230.6443.1587.7244.729.753.230.42310.8547.370.680.91.4312.288.37
2026.05
0.08556.3321.830.7755048.55.151.062106033.0243.658.370.77619.1760.0212.51.50220.834019.13
2026.05
0.00449.4311.441.02322.6248.8410.081.04819.7142.3320.8142.768.260.74719.3158.5722.471.41438.3657.6726.95