Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on HotPotQA (test) (Metrics: Acc, TC, TP)
Loading...
56.34
Accuracy
Search-R1
25.3376
33.3863
41.435
49.4837
Oct 1, 2025
Accuracy
Total Count
True Positives
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
Total Count
True Positives
Search-R1
inference mode=w/ sear...
2025.10
56.34
3
14.09
MASH w/ OTC
inference mode=w/ sear...
2025.10
55.42
1.14
32.91
MASH w/ EXP
inference mode=w/ sear...
2025.10
53.78
1.07
32.09
MASH w/ OTC-ST
inference mode=w/ sear...
2025.10
53.32
1.1
32.55
OTC
inference mode=w/ sear...
2025.10
44.76
0.81
28.64
R1
inference mode=w/ sear...
2025.10
26.53
0
26.53
Feedback
Search any
task
Search any
task