Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Theorem Question Answering on TheoremQA standard (test)
Loading...
56
Accuracy
Ours (theory-guided context selection strategy)
25.632
33.516
41.4
49.284
Feb 9, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Ours (theory-guided context selection strategy)
Model=Qwen3-8B
2026.02
56
ExpRAG
Model=Qwen3-8B
2026.02
55.6
ReMem
Model=Qwen3-8B
2026.02
55.6
DC
Model=Qwen3-8B
2026.02
55.4
BM25
Model=Qwen3-8B
2026.02
54.8
Zero
Model=Qwen3-8B
2026.02
54.6
Ours (theory-guided context selection strategy)
Model=Llama-3.1-8B
2026.02
28
ReMem
Model=Llama-3.1-8B
2026.02
27.7
DC
Model=Llama-3.1-8B
2026.02
27.6
ExpRAG
Model=Llama-3.1-8B
2026.02
27.6
BM25
Model=Llama-3.1-8B
2026.02
27
Zero
Model=Llama-3.1-8B
2026.02
26.8
Feedback
Search any
task
Search any
task