Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLM-as-a-Judge Robustness on Sage Easy

0.059Factuality Error (IPI)

Gemini-2.5-Pro

0.053040.093270.13350.17373Dec 17, 2025
Updated 4d ago

Evaluation Results

MethodLinks
0.0590.90.0841.2660.060.9240.11.5120.0580.8630.0721.091
0.0761.1340.131.9590.0981.5050.1031.5570.0861.2950.0981.485
0.0771.1630.1111.6790.0751.1750.0971.5040.0711.0810.0871.326
0.0951.4360.1422.1830.1412.2540.1241.8570.0871.30.1151.759
0.1011.5320.1752.6620.1261.9330.1241.9430.1021.5320.1261.919
0.1051.5870.1522.3210.1612.4330.1472.650.1021.5240.1332.091
2025.12
0.121.8180.2323.4890.1041.6220.1211.820.1271.9020.1432.158
0.131.9850.1852.7830.1472.290.1732.6120.1462.20.1572.38
0.1342.0220.1892.920.1462.4490.2043.0970.1322.0170.162.485
0.1362.0560.2083.1130.1231.90.1712.5690.1732.5970.1622.448
0.152.2940.2023.0850.1642.5220.1071.6670.1822.7330.1642.496
0.2043.2030.2293.5060.2640.233.4930.1792.7550.2213.41
0.2083.1770.2533.8220.3676.0460.3375.6910.2533.8560.2794.43