Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Binary comparison for commonsense plausibility on ViComTe Color 1.0 (test)

93.29Accuracy

GPT-4

85.770887.722989.67591.6271Feb 19, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.02
93.29
2025.02
93.29
2025.02
92.25
2025.02
86.79
2025.02
86.06