Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool-use Question Answering on HotpotQA 500 samples (evaluation)
Loading...
71.6
Accuracy
Baseline
70.976
71.138
71.3
71.462
May 13, 2026
Accuracy
Latency
Speedup
Updated 20d ago
Evaluation Results
Method
Method
Links
Accuracy
Latency
Speedup
Baseline
Model=openai-realtime-1.5
2026.05
71.6
4.5
-
AsyncIO
Model=openai-realtime-1.5
2026.05
71
3.6
1.3
Feedback
Search any
task
Search any
task