Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Meeting Information Extraction on Merged Typed Benchmark 113 meetings (Pairwise Evaluation)

-0.139Mean Difference

gpt-41-mini vs gpt-51

-0.14564-0.10082-0.056-0.01118Apr 23, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
-0.139---311992.09-
2026.04
-0.072---1521773.26-
2026.04
-0.071---311092.71-
2026.04
-0.067---1117855.1-
2026.04
-0.043---911034.63-
2026.04
-0.028---202915.25-
2026.04
0.006---602510.4478-
2026.04
0.021---642470.257-
2026.04
0.027---691430.0533-