Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Single-session-preference on LongMemEval S
Loading...
-
F1 Score
No plottable results for F1 Score (SCALAR).
Metric
F1 Score (SCALAR)
Accuracy (SCALAR)
LLM-as-a-Judge (SCALAR)
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
Accuracy
LLM-as-a-Judge
No evaluation results found.
Feedback
Search any
task
Search any
task