Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-context Retrieval on NIH
Loading...
100
Multi-needle Avg Recall
GPT-4
90.432
92.916
95.4
97.884
Jul 31, 2024
Multi-needle Avg Recall
Updated 4d ago
Evaluation Results
Method
Method
Links
Multi-needle Avg Recall
GPT-4
2024.07
100
GPT-4o
2024.07
100
Llama 3 8B
2024.07
98.8
Llama 3 405B
2024.07
98.1
Llama 3 70B
2024.07
97.5
Claude 3.5 Sonnet
2024.07
90.8
Feedback
Search any
task
Search any
task