Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Nonmonotonic reasoning on MultiLogicNMR
Loading...
100
Skeptical Accuracy
Gemini 2.5 Pro+ASP
36.872
53.261
69.65
86.039
Apr 30, 2026
Skeptical Accuracy
Credulous Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Skeptical Accuracy
Credulous Accuracy
Gemini 2.5 Pro+ASP
Inference Pipeline=LLM...
2026.04
100
93.8
Gemini 2.5 Flash+ASP
Inference Pipeline=LLM...
2026.04
98.3
98.3
o4-mini+ASP
Inference Pipeline=LLM...
2026.04
95.8
84
Gemini 2.5 Pro
Inference Pipeline=Bas...
2026.04
88.5
61
DS-R1-0528+ASP
Inference Pipeline=LLM...
2026.04
83.8
73.5
DS-V3+ASP
Inference Pipeline=LLM...
2026.04
74.3
55.3
o4-mini
Inference Pipeline=Bas...
2026.04
67.5
69.7
DS-V3
Inference Pipeline=Bas...
2026.04
57.7
51
Gemini 2.5 Flash
Inference Pipeline=Bas...
2026.04
42.3
36.2
DS-R1-0528
Inference Pipeline=Bas...
2026.04
39.3
46.8
Feedback
Search any
task
Search any
task