Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Question Answering on PolicyQA (test)

0.484SAE

GPT-4o-mini Multi-agent-few

0.303040.350020.3970.44398Jun 3, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.06
0.4840.460.4750.4730.4690.4670.4710.0060.024
2025.06
0.4780.4230.4580.4520.4440.4380.4490.0140.055
2025.06
0.4740.4760.4940.480.4870.480.4820.0060.02
2025.06
0.4640.4440.4510.4580.4470.4450.4520.0060.02
2025.06
0.4550.4360.4290.4370.4220.4220.4340.0090.033
2025.06
0.4510.480.4740.4830.4630.4810.4720.010.032
2025.06
0.4460.4830.4680.4720.4920.4770.4730.0110.046
2025.06
0.4120.3320.360.3570.3930.370.3710.0210.08
2025.06
0.40.380.3910.3850.3940.3720.3870.0080.028
2025.06
0.3810.3740.3680.3580.3720.3680.370.0060.023
2025.06
0.3520.3430.3320.3380.3310.3230.3370.0080.029
2025.06
0.310.260.2680.2310.2370.2890.2660.0230.079